Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'no more transporters are available' loop #145

Open
mrsippy opened this issue May 8, 2013 · 9 comments
Open

'no more transporters are available' loop #145

mrsippy opened this issue May 8, 2013 · 9 comments
Assignees
Labels

Comments

@mrsippy
Copy link

mrsippy commented May 8, 2013

Hi,

I recently implemented fileconveyor to sync static content for a number of sites to Rackspace Cloud Files.

I am not doing any processing on any of the files, just syncing them as they are found. As such, my config file has a source entry for each site, a server entry for each site, and 1 rule for each site.

When I run fileconveyor it works up until a point, it will usually run successfully for 10 - 15 minutes, and will then stop syncing for no apparent reason. I contacted Wim who suggested I increase logging level to "DEBUG", which I have done, and can now see that fileconveyor stops syncing because it gets into a loop where it is logging the following message +/- 5 times a second:

Transporting: no more transporters are available for server 'xxxxxxx'

Where xxxxxxx is one of the aforementioned server entries from my config file.

I have witnessed it this morning stuck in this loop for 20 minutes before I stopped it.

Also, if I start fileconveyor again, it inevitably gets into a loop again, but not necessarily on the same server.

My settings.py looks like this:

RESTART_AFTER_UNHANDLED_EXCEPTION = True
RESTART_INTERVAL = 5
LOG_FILE = '/var/log/fileconveyor.log'
PID_FILE = '/var/run/fileconveyor.pid'
PERSISTENT_DATA_DB = '/usr/local/src/fileconveyor/fileconveyor/persistent_data.db'
SYNCED_FILES_DB = '/usr/local/src/fileconveyor/fileconveyor/synced_files.db'
WORKING_DIR = '/tmp/fileconveyor'
MAX_FILES_IN_PIPELINE = 100
MAX_SIMULTANEOUS_PROCESSORCHAINS = 2
MAX_SIMULTANEOUS_TRANSPORTERS = 20
MAX_TRANSPORTER_QUEUE_SIZE = 3
QUEUE_PROCESS_BATCH_SIZE = 40
CALLBACKS_CONSOLE_OUTPUT = False
CONSOLE_LOGGER_LEVEL = logging.WARNING
FILE_LOGGER_LEVEL = logging.DEBUG
RETRY_INTERVAL = 5

My config.xml file is too lengthy to paste in full, but is essentially structured as follows:

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <!-- Sources -->
  <sources ignoredDirs="">
    <source name="website_1" scanPath="/var/www/website_1/htdocs/wp-content/uploads" />
    <source name="website_2" scanPath="/var/www/website_2/htdocs/wp-content/uploads" />
    etc
  </sources>

  <!-- Servers -->
  <servers>
    <server name="server_website_1" transporter="cloudfiles">
      <username>xxxxxxxxxxxx</username>
      <api_key>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</api_key>
      <container>server_1</container>
    </server>
    <server name="server_website_2" transporter="cloudfiles">
      <username>xxxxxxxxxxxx</username>
      <api_key>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</api_key>
      <container>server_2</container>
    </server>
  </servers>

 <!-- Rules -->
  <rules>
    <rule for="website_1" label="Website_1">
      <destinations>
        <destination server="server_website_1" path="/wp-content/uploads" />
      </destinations>
    </rule>
    <rule for="website_2" label="Website_2">
      <destinations>
        <destination server="server_website_2" path="/wp-content/uploads" />
      </destinations>
    </rule>
  </rules>
</config>

Any ideas?

Many thanks in anticipation -

chris

@mrsippy
Copy link
Author

mrsippy commented May 13, 2013

Any ideas? Anyone? I would try to fix it myself but am not a python developer and am not sure where to start.

@wimleers
Copy link
Owner

It's these two settings that determine how many simultaneous transporters there can be, and how many files can be queued for each:

MAX_SIMULTANEOUS_TRANSPORTERS = 20
MAX_TRANSPORTER_QUEUE_SIZE = 3

The message Transporting: no more transporters are available for server 'xxxxxxx' doesn't mean File Conveyor is stuck, it means that all 20 transporters already are A) transporting files, B) they each already have 3 queued files.

File Conveyor will just retry a bit later :)

Probably either or both of these things are true:

  1. (Some of) your files are rather large and hence take a long time to transport.
  2. Rackspace Cloud Files is being rather slow.

@ghost ghost assigned wimleers May 15, 2013
@mrsippy
Copy link
Author

mrsippy commented May 16, 2013

Thanks for getting back to me Wim. I have my doubts about the possible causes you suggest because I've seen fileconveyor in this state for 24 hours+ when there has been little to sync. I will try increasing the number of transporters and the queue size and test further.

chris

@mrsippy
Copy link
Author

mrsippy commented May 17, 2013

Hi Wim,

I started fileconveyor again shortly after leaving my last comment. Incidentally, in case it's of any bearing, I'm running fileconveyor using nohup, i.e.

nohup python /usr/local/src/fileconveyor/fileconveyor/arbitrator.py > /var/log/nohup.log 2>&1&

The last file that fileconveyor synced was at 11:06am yesterday, some 30+ hours ago, and there have been many files added to my sites since then which should have been synced.

Any ideas?

@mrsippy
Copy link
Author

mrsippy commented Jun 5, 2013

This is still an issue for me I'm afraid.

@wimleers
Copy link
Owner

wimleers commented Sep 3, 2013

Can you enable debug logging and then analyze your log to check if something bizarre/interesting is happening? Alternatively, upload the log here.

@leesolway
Copy link

Same issue for me unfortunately. I can't see anything in the log that is unusual?

@wimleers
Copy link
Owner

Then can you please post your log somewhere so I can take a look at it? (You can post it here, though I'm not sure how big files GitHub will accept.)

@trolleycrash
Copy link

We were also experiencing this same issue. After much hair-pulling, I zeroed in on what the problem was. We had an empty processor chain, which seemed to result in the processor callback not getting fired all the time. The effect was that the processor queue would fill right up, and we would exceed MAX_FILES_IN_PIPELINE. Then everything would stall.

What we did to solve it was just add an innocuous processor to the processor chain in config.xml:

<processorChain>
        <processor name="unique_filename.Mtime" />
</processorChain>

For reference, I believe it's possible this is what was causing Issue 129 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants