#!/usr/bin/perl -w

=head1 NAME

drsync - rsync wrapper for synchronizing file repositories which are changed
in both sides

=head1 SYNOPSIS

  drsync.pl [ --rsync=/usr/bin/rsync ] [ --state-file=state_file.gz ] \
    [ --bzip2=/usr/bin/bzip2 ] [ --gzip=/usr/bin/gzip ] \
    rsync-args ... SRC [ ... ] DEST

=head1 DESCRIPTION

drsync is a wrapper for rsync. It does nothing unless you specify the
--state-file arg: it simply calls rsync with the given parameters.

If you specify --state-file, then it generates a file-list of your source
repositories, and compares that with the current filelist. Files which are
added or deleted are propagated to the destination place (new files are
created, deleted files are deleted there also), and the filelist is updated.

The list file can optionally be compressed with bzip2 or gzip, the program
detects it by the extension of the --state-file.

You can use --rsync, --bzip, --gzip to specify the path of these programs. 

=head1 EXAMPLE

=head2 Mailbox synchronization

I use this script to synchronize my mail-folders between two linux machines.
The plan was to use my notebook and my desktop computer to read and write
emails, and I wanted to see all the folders in both places.

NOTE: There are some drawbacks if you synchronize the mailboxes with
drsync, see the details at the end of the section.

I have a lot of incoming folders, all of those are located in the ~/mail
directory, and named INBOX.*. These are all in "maildir" format (one mail=one
file!), because it is better for synchronization than the mbox format.

I use this simple script on the notebook computer to synchronize the desktop
and the notebook mailboxes:

  drsync.pl --verbose --rsh=ssh --exclude=BACKUP --recursive \
    --state-file=.mail.desktop.drsync.bz2 desktop:mail ~
  drsync.pl --verbose --rsh=ssh --recursive \
    --state-file=.mail.notebook.drsync.bz2 mail desktop:

In the first step drsync copies the new mails from the desktop to the
notebook, and in the second, it copies the changes from the notebook back to
the desktop.

It works properly unless you change a file in both side. When you do
this, your last version overwrites the first! This is why maildir is
better for this purpose (less chance to modify the same file on both side).

As I mentioned before, there are drawbacks if you synchronize the folders
with drsync:

=over 4

=item *

The client, which accidentally access the mailbox folder when it is not
fully synchronized, can read 0-byte mailboxes, so it sees an inconsistent
state.

=item *

The client, which reads any file , which is 0-byte length and which is in
the "new" directory of the maildir folder, moves it to the "cur" directory,
so it will remain 0 byte length, because the rsync program won't find it when
it synchronizes.

=item *

If an attribute is changed or a file is moved around in the
file-sstructure, then it has to be resynchronized, drsync cannot handle
"move" or "copy" operations.

=back

The solution for this specific problem (mailbox synchronization) is the
maildirsync utility, which transfers as small amount of data as possible
between two machines, can handle "move" and "copy" operations and much more.

See http://hacks.dlux.hu/maildirsync for more info.

But: It does not mean this software is not good. It is perfectly good for
synchronizing directory structures which can tolerate if the target is
offline while the synchronization happens.

=head1 HOW IT WORKS

rsync made the majority of the work, so rsync is required in both sides of the
synchronization. drsync is required only in the caller side.

First, it loads the file-list from the file, which is specified by the
"--state-file" command-line argument.

Then it generates the current filelist (by calling rsync -n), and compares the
two state.

Then it deletes the deleted files and creates the newly created files in the
destination place, using rsync-rsh if necessary. The new files are created
with 1970-01-01 timestamp (unix epoch), and they are 0 bytes long.

Then the new filelist is written back to the disk. Note: the filelist must be
in the machine, where drsync.pl runs.

Last, but not least we call rsync (again) to do the synchronization works,
with "--existing" and "--update". Then it copies the files which are
necessary to be copied.

The "state-file" can be compressed with gzip or bzip2, it is detected by the
extension of the file.

=head1 COMMAND-LINE SWITCHES

The script accepts most of the rsync options, and it calls rsync with the
given options.

If you _DO NOT_ specify --state-file option, then it calls rsync with no
changes at the command-line, so check L<rsync> for more info. The following
options apply _ONLY_ if --state-file is provided in the command-line.

=head2 Command Line switches for drsync.pl

=over 4

=item --state-file=filename

Sets the name of the file where to store the filenames of the current
state. If it has '.bz2' extension, it is automatically compressed and
decompressed by bzip2, if it has '.gz' exitension, then gzip is used. All
other filename assumes that the file is a plain textfile with filenames, which
has one filename per row.

=item --verbose, -v, -v=2, --verbose=2

If you specify -v or --verbose, you can see some debug messages by drsync, if
you use the -v=2 or --verbose=2 form, it calls rsync in verbose mode also.

=item --bzip2=bzip2_path

You can specify the path to "bzip2" executable. If not specified, then "bzip2"
is used.

=item --gzip=gzip_path

You can specify the path to "gzip" executable. If not specified, then "gzip"
is used.

=item --rsync=rsync_path

You can specify the path to "rsync" executable. If not specified, then "rsync"
is used.

=item --rsh=rsh_path

This is an rsync option also, it specifies the "rsh" command to run. You also
can use the RSYNC_RSH environment variable to set this.

=back

=head2 Overwritten rsync command-line options

These switches are available for rsync, but the meanings are changed or lost
when you use "drsync".

=over 4

=item --verbose, -v

See above.

=item --existing, --update

These methods are default in drsync.pl, so you don't need to specify them.

=item -n, --dry-run

This argument can be used to instruct drsync and rsync not to do any changes
to repositories and filelists. Use with -v option.

=back

=head1 TODO

=head2 Short-term

We need more error-handling on pipe opens.

=head2 Long-term

There are no long-term plans with this software, because this work of
operation has very strict limitations. Ideas are welcome. :-)

=head1 COPYRIGHT

Copyright (c) 2000-2004 Szab, Balzs (dLux)

All rights reserved. This program is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.

=head1 AUTHOR

dLux (Szab, Balzs) <dlux@dlux.hu>

=head1 CREDITS

  - Paul Hedderly <paul@mjr.org> (debian packaging, fixes for new rsync)
  - Franz Rauscher <F.Rauscher@mainwork.com> (bugfix)
  - Pandu Rao <prao@storage.com> (cleanup)

=cut

use DRSync;

drsync(@ARGV);
