Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogCollect failures with not-implemented createDebuggingCommand method #12283

Open
amaltaro opened this issue Feb 27, 2025 · 0 comments · May be fixed by #12284
Open

LogCollect failures with not-implemented createDebuggingCommand method #12283

amaltaro opened this issue Feb 27, 2025 · 0 comments · May be fixed by #12284
Assignees

Comments

@amaltaro
Copy link
Contributor

Impact of the bug
WMAgent (current version 2.3.9.2)

Describe the bug
As Artur reported in the CompOps mattermost channel, LogCollect jobs are failing with the following error message [1].

This error appears because LogCollect jobs transfer the log tarball over to CERN, with the XRDC implementation, which at the moment does not implement the method createDebuggingCommand().

Note that this problem only surfaced because CERN /store namespace is out of quota, which caused all stage out retries to fail and the debugging block to be executed.

How to reproduce it
None

Expected behavior
We need to implement the Interface
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Storage/StageOutImpl.py#L145

in all stage out plugins (Backends python package) that can be reached in this line:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Storage/StageOutImpl.py#L239

For those plugins that will soon be deprecated (with tokens adoption?), we can skip the actual debugging code, if needed, and just leave a simple placeholder like print/echo.

Additional context and error message
[1]

WMBS job id: 15678072
Task: /pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513/DataProcessing/LogCollect
Status: jobfailed
Site: T1_DE_KIT
Agent: cmsgwms-submit14.fnal.gov

logCollect1
    WMAgentStepExecutionError (Exit Code: 60408)

        <@========== WMException Start ==========@>
        Exception Class: LogCollectStageOutError
        Message: Unable to stageOut LogCollect to Castor:
        <@========== WMException Start ==========@>
        Exception Class: StageOutFailure
        Message: Failure for override stage out:
        StageOutImpl.createDebuggingCommand
        	ClassName : None
        	ModuleName : WMCore.Storage.StageOutError
        	MethodName : __init__
        	ClassInstance : None
        	FileName : /srv/job/WMCore.zip/WMCore/Storage/StageOutError.py
        	LineNumber : 32
        	ErrorNr : 0
        	Command : xrdcp
        	LFN : /store/logs/prod/2025/02/WMAgent/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513-LogCollect-c02-014-184-69-logs.tar
        	InputPFN : /srv/job/WMTaskSpace/logCollect1/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513-LogCollect-c02-014-184-69-logs.tar
        	TargetPFN : root://eoscms.cern.ch//eos/cms/store/logs/prod/2025/02/WMAgent/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513-LogCollect-c02-014-184-69-logs.tar
        	ErrorCode : 60311
        	ErrorType : GeneralStageOutFailure

        Traceback: 
          File "/srv/job/WMCore.zip/WMCore/Storage/StageOutMgr.py", line 361, in stageOut
            impl(self.overrideConf['command'], localPfn, pfn, self.overrideConf["option"], checksums)

          File "/srv/job/WMCore.zip/WMCore/Storage/StageOutImpl.py", line 239, in __call__
            command = self.createDebuggingCommand(sourcePFN, targetPFN, options, checksums)

          File "/srv/job/WMCore.zip/WMCore/Storage/StageOutImpl.py", line 150, in createDebuggingCommand
            raise NotImplementedError("StageOutImpl.createDebuggingCommand")

        <@---------- WMException End ----------@>
        	ClassName : None
        	ModuleName : WMCore.WMSpec.Steps.WMExecutionFailure
        	MethodName : __init__
        	ClassInstance : None
        	FileName : /srv/job/WMCore.zip/WMCore/WMSpec/Steps/WMExecutionFailure.py
        	LineNumber : 18
        	ErrorNr : 60408

        Traceback: 
          File "/srv/job/WMCore.zip/WMCore/WMSpec/Steps/Executors/LogCollect.py", line 234, in execute
            eosStageOutMgr(tarInfo)

          File "/srv/job/WMCore.zip/WMCore/Storage/StageOutMgr.py", line 289, in __call__
            raise lastException

          File "/srv/job/WMCore.zip/WMCore/Storage/StageOutMgr.py", line 272, in __call__
            pfn = self.stageOut(lfn, fileToStage['PFN'], fileToStage.get('Checksums'))

          File "/srv/job/WMCore.zip/WMCore/Storage/StageOutMgr.py", line 365, in stageOut
            raise StageOutFailure(msg, Command=self.overrideConf['command'],
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants