Skip to content

weinberg/duq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

duq

Quick 'du' for linux.


--- OVERVIEW ---


duq provides a mechanism to generate near-realtime disk usage reports for very large
directory trees. It utilizes the inotify kernel API to watch a set of directories
for changes. Changes are logged in a SQLite database. Disk usage reports are performed
by querying this database. This results in a speed increase of 1000x or more over 'du'
for large directory trees.

There are three parts to duq:

  duqbase - The baseline tool to store the current sizes in the database.
  duqd - The daemon which watches for inotiy events and updates the database.
  duq - The command which generates disk usage reports from the database.

Since the duq command is merely summing up the total sizes for a column in the 
database and not hitting the filesystem at all, the results are generated much
faster than by using 'du'.


--- REQUIREMENTS ---


Perl
Perl modules: 

	DBD::SQLite
	File::Spec
	Filesys::DiskUsage
	Number::Bytes::Human
	Proc::Daemon

inotify-tools:
	https://github.com/rvoicilas/inotify-tools/wiki/


--- SETUP ---


1 Install the startup script:
	# cp startup.sh /etc/init.d/duqd
	# chkconfig install duqd
2 Start duqd:
	# /etc/init.d/duqd startup
3 Edit /etc/duq.conf. Add a single line for each directory you want duq to watch.
4 Run duqbase to generate a baseline.
	This will take a while (approximately as long as doing a normal 'du'
	on those directories). The duq command will alert you that its results
	may be incorrect while duqbase is running.
5 Once duqbase is complete, use duq like du:

	# duq /usr
	6.2G      /usr

  You can provide multiple dirs on the command line (or wildcards) to get results for
  each dir:

	# duq /usr/*
	501M      /usr/bin
	4.0K      /usr/etc
	4.0K      /usr/games
	127M      /usr/include
	195M      /usr/java
	1.6M      /usr/kerberos
	1.8G      /usr/lib
	790M      /usr/lib64
	94M       /usr/libexec
	2.0G      /usr/local
	16K       /usr/lost+found
	39M       /usr/sbin
	1.5G      /usr/share
	128M      /usr/src
	8.0K      /usr/X11R6

6 Re-run the baseline periodically. This will ensure that the sizes are kept in sync:

  30 1 * * 0 /usr/local/bin/duqbase

--- CAVEATS ---

Duq makes use of inotify and a sqlite3 database to track changes to your filesystem.
There are a few drawbacks of this approach which users should be aware of. First, it
is possible for the database to get out of sync with the filesystem. This can happen
any time the duqd daemon is not running, during the boot or shutdown sequence or if it
dies for some reason. This is why it is a good reason to run duqbase periodically to
get the two back in sync.

Secondly and possibly most importantly, duq can use a lot of memory. Inotify uses
about 40 bytes per inode to do it's job and on a filesystem with millions of inodes
this can result in a large amount of memory being taken up by this process. To make
matters worse, since inotify is a kernel module the memory it allocates is not
swappable so you really do lose all the memory it takes. For this reasone some may
prefer to just run duqbase periodically and use duq to get an estimate of current disk
space while not running duqd at all. In this case there is no memory impact at all.

--- AUTHOR ---

Joshua Weinberg
Digabit, Inc.

--- LICENSE ---

About

Quick 'du' for linux.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published