On Unix, most scripts don't come with a file extension, it's not needed, and
we distribute the script as "archivemail" anyway. And most importantly, I
like it better without the extension. :)
With a little trick we can still load the script as a module from the test
suite.
Notable items that are now resolved or implemented:
* archives are now locked
* the mbox classes have been refactored to a cleaner design
* we moved from flock locking to fcntl
* the setuid() feature is long gone
* symlink attacks for tempfiles are not possible (that is really an
ancient TODO item from the original author)
* the test suite now has a lot of maildir test cases
I just discovered that archivemails MH support is broken with respect to
message flags, and in my opinion it doesn't make much sense to test
known-broken functionality.
In fact there may well be zero archivemail users with MH mailboxes; MH is
basically an obsolete format, and any archivemail user with MH mailboxes would
probably have complained about lost message flags.
This code is complex, too complex actually. Rename some methods and
variables, rework some code and and add some explaining comments in order to
make it it least a bit easier to understand.
This should minimize the risk of data loss. Flushing a locked mbox file
before unlocking it also ensures that there's no window when another
process could lock the mbox after us, but still see the old content.
The mbox locking methods move into a new class LockableMboxMixin, and the
Mbox and ArchiveMbox classes become subclasses of LockableMboxMixin.
class StaleFiles is updated to handle multiple dotlock files.
In particular:
* If writing the archived messages to the final archive fails, try to
restore the archive and abort (by not handling the exception). This is
possible since we first save the archive, and only then the modified
mailbox, so we don't corrupt the original mbox in this case.
* If writing a modified mbox file fails, save the temporary copy.
The RetainMbox and ArchiveMbox classes are now gone, mainly because their
finalise() methods were messing with the archived mbox and the archive,
respectively, which was not good OO design.
The core functionality of the finalise() methods of both removed classes
is moved to the objects that are manipulated: the Mbox class representing
the mbox that is being archived gains a new method overwrite_with(), and
there is a new class ArchiveMbox that represents the actual archive, which
has an append() method (yes, unfortunately the new class has the same name
like the removed class).
The RetainMbox instance is replaced with a TempMbox, and the ArchiveMbox
instance either with a TempMbox, or a CompressedTempMbox if archive
compression is enabled.
Finally, a compressed TempMbox is now a implemented as a subclass of
TempMbox, named CompressedMbox.
Cooperation with the StaleFiles class moves into the TempMbox class.
This means slightly less detailed verbose cleanup reporting, oh well.
We used to create a dotlock file first and then lock with fcntl; swap that
order, since locking first with fcntl seems to be more common.
This patch also adds general mbox lock/unlock methods, which call the
dotlock and fcntl-lock methods, and moves the retry logic there.
When the dotlock and fcntl methods fail to acquire a lock, they now raise
a custom exception "LockUnavailable", which gets caught in the general
lock() method. That way, if we succeed to acquire one lock but fail to
acquire the other, we can release our locks at the upper level and retry.
These helper methods provide success verification after test archiving runs, and
test case setup. This is a tradeoff: because these methods need to support all
scenarios in one place, they introduce some new complexity - but they replace a
lot of tedious, very similar, but still not entirely identical code all over the
place.
TestArchiveMboxPreserveStatus actually doesn't test that the message
status is preserved, but that the --preserve-unread option works.
Rename it to TestArchiveMboxPreserveUnread.
The test suite used to run a lot of triple tests, by first calling
archivemail.archive() directly, and then running the entire archivemail
script twice, once with long and once with short options. But we already
test option processing seperately, and beyond that, archivemail.main()
essentially just calls archive() for each mailbox in turn. So we just drop
all runs of the entire archivemail script from the test suite, giving it a
huge speed boost (on my old iBook, running the test suite drops from 73 to
5 seconds).