Skip to content

Dockerised service pulling data from remote server & extracting archives

Notifications You must be signed in to change notification settings

dezerving/rclone-fetcher

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Seedbox data fetcher & extractor

Dockerised service periodically pulling data from a remote seedbox & extracting archived files.

Note data is synced unidirectionally, and if already downloaded & processed asset gets deleted on the remote, then it's also nuked locally. This is generally the preferred method, as *arr service (or whatever other media manager you happen to use) should be responsible for torrent removal upon successful import anyways.

It's also important to know if an already-downloaded source file is modified at the remote (apart from deletion), then those modifications will no longer be reflected locally.

Rationale

This service aims to solve a common issue with the servarr projects around data import (+ also provides extraction) described here. tl;dr of it is if *arr is monitoring a directory and expecting say full season worth of files, but by the time it goes to check/import only half of episodes have been downloaded from your remote seedbox, then only those episodes present would be imported.

We solve this by using rclone to first download assets into an intermediary directory not monitored by *arr services, optionally process them (eg extract archives), and then move them atomically to a destination directory that *arr expects them in.

servarrs' completed download handling is documented/described here; archived asset handling isn't described in much detail, but can be found here.

Configuration

Required environment variables

  • REMOTE: your configured rclone remote;
  • SRC_DIR: path to the source directory on your remote to pull data from;
  • DEST_FINAL: path to the mounted directory in container to finally move pulled data to; note this would be the directory monitored by your *arr services et al;

Optional environment variables

  • DEST_INITIAL: path to the mounted dir in container where rclone initially downloads data to; if not defined, then a directory is created inside $DEST_FINAL; it's highly recommended you define this. Also make sure it lies on the same filesystem as DEST_FINAL, so mv command is atomic;
  • DEPTH: sets the depth level at which files are searched/synced at; defaults to 1; see below for closer depth explanation;
  • RM_EMPTY_PARENT_DIRS: set this to any non-empty value to delete empty parent dirs; used only if DEPTH > 1;
  • CRON_PATTERN: cron pattern to be used to execute the syncing script; eg */10 * * * * to execute every 10 minutes; defaults to every 5 min;
  • SKIP_EXTRACT: set this to any non-empty value to skip archived file extraction;
  • SKIP_ARCHIVE_RM: set this to any non-empty value to skip removal of archives that were successfully extracted;
  • SKIP_LOCAL_RM: set this to any non-empty value to skip removing assets in $DEST_FINAL whose counterpart has been removed on the remote;
  • RCLONE_FLAGS: semicolon-separated options to use with all rclone commands; note this overwrites the default rclone flags altogether, so make sure you know what you're doing;
  • RCLONE_OPTS: semicolon-separated additional options to be passed to all rclone commands; useful eg if you want to override the --bwlimit option (which defaults to 20M) or increase logging verbosity;
  • WATCHDIR_DEST: path to the watchdir on your remote;
  • WATCHDIR_SRC: path to the watchdir mounted in container;
  • PGID: group id; defaults to 100 (users)
  • PUID: user id; defaults to 99 (nobody)

Required mountpoints & files

  • DEST_FINAL (& DEST_INITIAL if env var is defined) directories need to be backed by mounted directory (or directories if they reside on different mountpoints)
    • same for WATCHDIR_SRC, if defined
  • mount point to configuration root dir at /config also needs to be provided;
  • valid rclone config file rclone.conf needs to be present in /config mount dir; this conf needs to define the remote referenced by REMOTE env var;

Example docker command:

docker run -d \
    --name seedbox-fetcher \
    -e REMOTE=seedbox \
    -e SRC_DIR=files/complete \
    -e DEST_INITIAL=/data/rclone-tmp \
    -e DEST_FINAL=/data/complete \
    -e PUID=1000 \
    -v /host/dir/downloads/torrents:/data \
    -v $HOME/.config/seedbox-fetcher:/config \
    layr/seedbox-rclone-fetch-extract

On syncing logic and DEPTH env var

DEPTH env var selects the depth level in relation to SRC_DIR in which files&dirs are downloaded/removed from. If any of replicated/downloaded nodes get deleted on the remote server, they will also be deleted from DEST_FINAL.

If additional file or dir gets written into an already-downloaded directory, then this addition wouldn't be downloaded, as downloaded nodes are considered finalized, meaning no changes to them are replicated, only their removal. This applies also for child removals -- ie if a child file in an already-replicated directory is removed on remote, then this removal won't be reflected in our local copy.

In other words, download/remove happens only if addition/removal is detected at given DEPTH.

Say your SRC_DIR on the remote server looks like:

$ tree SRC_DIR
SRC_DIR/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
├── dir2
│   └── file2
└── file3

DEPTH=1 (default)

If DEPTH=1, then dir1/, dir2/ & file3 would be replicated to DEST_FINAL. If any of them gets deleted on the remote server, it will also be deleted from DEST_FINAL. If additional file or dir gets written into or removed from dir1/ or dir2/, then this addition or removal wouldn't be downloaded.

Replicated copy would look like an exact copy of the remote:

$ tree DEST_FINAL
DEST_FINAL/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
├── dir2
│   └── file2
└── file3

Now let's say file3 and file2 were removed on remote, and newfile was written into dir1/. After sync our local copy would look like:

$ tree DEST_FINAL
DEST_FINAL/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
└── dir2
    └── file2

Note file3 removal was reflected in our local copy as expected. But newfile addition nor file2 removal weren't. This is because their parent directories (dir1/ and dir2/ respectively) had already been replicated, and thus are considered finalized.

DEPTH=2

If DEPTH=2, then dir12/, file1 & file2 would be replicated to DEST_FINAL while preserving the original directory structure - meaning parent directories from the SRC_DIR root will be created also on DEST_FINAL. If any of them gets deleted on the remote server, it will also be deleted from DEST_FINAL. If additional file or dir gets written into or removed from dir12/, then this addition or removal wouldn't be downloaded. Note file3 is completely ignored by the service, as it sits at depth=1 level.

Replicated copy would look like:

$ tree DEST_FINAL
DEST_FINAL/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
└── dir2
    └── file2

Now let's say file121 and file2 were removed on remote, and newfile was written into dir1/dir12/. After sync our local copy would look like:

$ tree DEST_FINAL
DEST_FINAL/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
└── dir2

Note file2 removal was reflected in our local copy as expected. But newfile addition nor file121 removal weren't. This is because their parent directory dir1/dir12/ had already been replicated, and thus is considered finalized.

If you want empty parent directories (dir2/ in above example) to be cleaned up, then set RM_EMPTY_PARENT_DIRS env var to a non-empty value.

Debugging

Sometimes it's useful to debug rclone/config issues directly from the container shell. If you're doing so, make sure to run all commands as abc user, otherwise you may accidentally mess up some files' ownership. e.g.:

su abc -s /bin/sh -c 'rclone lsf -vvv --max-depth 1 --config /config/rclone.conf  your-remote:'
su abc -s /bin/sh -c /sync.sh

TODO

  • notification systems
  • set up logrotate
  • healthchecks
  • do we need --tpslimit and/or --checkers option?
  • confirm optimal --transfers opt;
  • confirm extract.sh/enough_space_for_extraction() works as intended
  • skip downloads of assets that wouldn't fit on local filesystem; eg similar to enough_space_for_extraction(), but for downloading, not extracting;
  • find a better way for compiling copy command - atm we're escaping filenames for the --filter + flags; --files-from might be an option, but unsure whether it'd pull the whole dir if only dir -- as opposed to its contents -- is listed;

About

Dockerised service pulling data from remote server & extracting archives

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 97.7%
  • Dockerfile 2.3%