From 902a81b4bcf1b133b434322370996c2cd30e0f26 Mon Sep 17 00:00:00 2001 From: Paul Rodger Date: Tue, 26 Mar 2002 03:53:09 +0000 Subject: [PATCH] Initial revision --- COPYING | 341 ++++++++++++++++++++++++++ README | 21 ++ TODO | 26 ++ archivemail.1 | 635 +++++++++++++++++++++++++++++++++++++++++++++++++ archivemail.py | 569 ++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 1592 insertions(+) create mode 100644 COPYING create mode 100644 README create mode 100644 TODO create mode 100644 archivemail.1 create mode 100755 archivemail.py diff --git a/COPYING b/COPYING new file mode 100644 index 0000000..86fd703 --- /dev/null +++ b/COPYING @@ -0,0 +1,341 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Library General +Public License instead of this License. + diff --git a/README b/README new file mode 100644 index 0000000..9003647 --- /dev/null +++ b/README @@ -0,0 +1,21 @@ + +archivemail - archive and compress old mail in your mailbox + +'archivemail' is a tool written in Python for organising and storing old +email choking any of your mailboxes. It can move messages older than a +certain number of days to a separate 'archive' mailbox which can be +compressed with bzip2, gzip or compress. + +For example, have you been subscribing to the 'linux-kernel' mailing list +for the last 6 years and ended up with an 160-meg mailbox that 'mutt' is +taking a long time to load? 'archivemail' can move all messages that are +older than 6 months to a separate compressed mailbox, and leave you with +just the most recent messages. + +'archivemail' can save a lot of disk space and will significantly reduce +overhead on your mail reader. The number of days before mail is considered +'old' is up to you, but the default is 180 days. + +'archivemail' currently works on mbox-format mailboxes, and requires python +v2.0 or greater. It also supports deleting old mail instead of archiving +it. It currently only works on Unix platforms. diff --git a/TODO b/TODO new file mode 100644 index 0000000..372d861 --- /dev/null +++ b/TODO @@ -0,0 +1,26 @@ + +add Maildir support + +add MH support + +start using private variables? + +finish man page + +add option to archive depending on mailbox size threshold + + is this a good idea? + +perserve atime of mailbox properly + +lock any original .gz files (?) + +check for symlink attacks for tempfiles (although we don't use /var/tmp) + +test for write permission before doing anything + +test for missing compression programs + + is this a waste of time? + +add option - do not compress (?) + +Add Makefile with "make install" target ? diff --git a/archivemail.1 b/archivemail.1 new file mode 100644 index 0000000..855cb16 --- /dev/null +++ b/archivemail.1 @@ -0,0 +1,635 @@ +.\" archivemail man page +.if !\n(.g \{\ +. if !\w|\*(lq| \{\ +. ds lq `` +. if \w'\(lq' .ds lq "\(lq +. \} +. if !\w|\*(rq| \{\ +. ds rq '' +. if \w'\(rq' .ds rq "\(rq +. \} +.\} +.de Id +.ds Dt \\$4 +.. +.TH archivemail 1 \*(Dt "GNU Project" +.SH NAME +archivemail \- archive and compress old email +.SH SYNOPSIS +.B archivemail +.RI [ options ] +.I FILE +.RI [ FILE .\|.\|.] +.br +.SH DESCRIPTION +.PP +.B archivemail +archives and compresses and +.IR FILE s + + + +.IR PATTERN . +By default, +.B grep +prints the matching lines. +.PP +In addition, two variant programs +.B egrep +and +.B fgrep +are available. +.B Egrep +is the same as +.BR "grep\ \-E" . +.B Fgrep +is the same as +.BR "grep\ \-F" . +.SH OPTIONS +.TP +.BI \-A " NUM" "\fR,\fP \-\^\-after-context=" NUM +Print +.I NUM +lines of trailing context after matching lines. +.TP +.BR \-a ", " \-\^\-text +Process a binary file as if it were text; this is equivalent to the +.B \-\^\-binary-files=text +option. +.TP +.BI \-B " NUM" "\fR,\fP \-\^\-before-context=" NUM +Print +.I NUM +lines of leading context before matching lines. +.TP +\fB\-C\fP [\fINUM\fP], \fB\-\fP\fINUM\fP, \fB\-\^\-context\fP[\fB=\fP\fINUM\fP] +Print +.I NUM +lines (default 2) of output context. +.TP +.BR \-b ", " \-\^\-byte-offset +Print the byte offset within the input file before +each line of output. +.TP +.BI \-\^\-binary-files= TYPE +If the first few bytes of a file indicate that the file contains binary +data, assume that the file is of type +.IR TYPE . +By default, +.I TYPE +is +.BR binary , +and +.B grep +normally outputs either +a one-line message saying that a binary file matches, or no message if +there is no match. +If +.I TYPE +is +.BR without-match , +.B grep +assumes that a binary file does not match; this is equivalent to the +.B \-I +option. +If +.I TYPE +is +.BR text , +.B grep +processes a binary file as if it were text; this is equivalent to the +.B \-a +option. +.I Warning: +.B "grep \-\^\-binary-files=text" +might output binary garbage, +which can have nasty side effects if the output is a terminal and if the +terminal driver interprets some of it as commands. +.TP +.BR \-c ", " \-\^\-count +Suppress normal output; instead print a count of +matching lines for each input file. +With the +.BR \-v ", " \-\^\-invert-match +option (see below), count non-matching lines. +.TP +.BI \-d " ACTION" "\fR,\fP \-\^\-directories=" ACTION +If an input file is a directory, use +.I ACTION +to process it. By default, +.I ACTION +is +.BR read , +which means that directories are read just as if they were ordinary files. +If +.I ACTION +is +.BR skip , +directories are silently skipped. +If +.I ACTION +is +.BR recurse , +.B grep +reads all files under each directory, recursively; +this is equivalent to the +.B \-r +option. +.TP +.BR \-E ", " \-\^\-extended-regexp +Interpret +.I PATTERN +as an extended regular expression (see below). +.TP +.BI \-e " PATTERN" "\fR,\fP \-\^\-regexp=" PATTERN +Use +.I PATTERN +as the pattern; useful to protect patterns beginning with +.BR \- . +.TP +.BR \-F ", " \-\^\-fixed-strings +Interpret +.I PATTERN +as a list of fixed strings, separated by newlines, +any of which is to be matched. +.TP +.BI \-f " FILE" "\fR,\fP \-\^\-file=" FILE +Obtain patterns from +.IR FILE , +one per line. +The empty file contains zero patterns, and therefore matches nothing. +.TP +.BR \-G ", " \-\^\-basic-regexp +Interpret +.I PATTERN +as a basic regular expression (see below). This is the default. +.TP +.BR \-H ", " \-\^\-with-filename +Print the filename for each match. +.TP +.BR \-h ", " \-\^\-no-filename +Suppress the prefixing of filenames on output +when multiple files are searched. +.TP +.B \-\^\-help +Output a brief help message. +.TP +.BR \-I +Process a binary file as if it did not contain matching data; this is +equivalent to the +.B \-\^\-binary-files=without-match +option. +.TP +.BR \-i ", " \-\^\-ignore-case +Ignore case distinctions in both the +.I PATTERN +and the input files. +.TP +.BR \-L ", " \-\^\-files-without-match +Suppress normal output; instead print the name +of each input file from which no output would +normally have been printed. The scanning will stop +on the first match. +.TP +.BR \-l ", " \-\^\-files-with-matches +Suppress normal output; instead print +the name of each input file from which output +would normally have been printed. The scanning will +stop on the first match. +.TP +.B \-\^\-mmap +If possible, use the +.BR mmap (2) +system call to read input, instead of +the default +.BR read (2) +system call. In some situations, +.B \-\^\-mmap +yields better performance. However, +.B \-\^\-mmap +can cause undefined behavior (including core dumps) +if an input file shrinks while +.B grep +is operating, or if an I/O error occurs. +.TP +.BR \-n ", " \-\^\-line-number +Prefix each line of output with the line number +within its input file. +.TP +.BR \-q ", " \-\^\-quiet ", " \-\^\-silent +Quiet; suppress normal output. The scanning will stop +on the first match. +Also see the +.B \-s +or +.B \-\^\-no-messages +option below. +.TP +.BR \-r ", " \-\^\-recursive +Read all files under each directory, recursively; +this is equivalent to the +.B "\-d recurse" +option. +.TP +.BR \-s ", " \-\^\-no-messages +Suppress error messages about nonexistent or unreadable files. +Portability note: unlike \s-1GNU\s0 +.BR grep , +traditional +.B grep +did not conform to \s-1POSIX.2\s0, because traditional +.B grep +lacked a +.B \-q +option and its +.B \-s +option behaved like \s-1GNU\s0 +.BR grep 's +.B \-q +option. +Shell scripts intended to be portable to traditional +.B grep +should avoid both +.B \-q +and +.B \-s +and should redirect output to /dev/null instead. +.TP +.BR \-U ", " \-\^\-binary +Treat the file(s) as binary. By default, under MS-DOS and MS-Windows, +.BR grep +guesses the file type by looking at the contents of the first 32KB +read from the file. If +.BR grep +decides the file is a text file, it strips the CR characters from the +original file contents (to make regular expressions with +.B ^ +and +.B $ +work correctly). Specifying +.B \-U +overrules this guesswork, causing all files to be read and passed to the +matching mechanism verbatim; if the file is a text file with CR/LF +pairs at the end of each line, this will cause some regular +expressions to fail. +This option has no effect on platforms other than MS-DOS and +MS-Windows. +.TP +.BR \-u ", " \-\^\-unix-byte-offsets +Report Unix-style byte offsets. This switch causes +.B grep +to report byte offsets as if the file were Unix-style text file, i.e. with +CR characters stripped off. This will produce results identical to running +.B grep +on a Unix machine. This option has no effect unless +.B \-b +option is also used; +it has no effect on platforms other than MS-DOS and MS-Windows. +.TP +.BR \-V ", " \-\^\-version +Print the version number of +.B grep +to standard error. This version number should +be included in all bug reports (see below). +.TP +.BR \-v ", " \-\^\-invert-match +Invert the sense of matching, to select non-matching lines. +.TP +.BR \-w ", " \-\^\-word-regexp +Select only those lines containing matches that form whole words. +The test is that the matching substring must either be at the +beginning of the line, or preceded by a non-word constituent +character. Similarly, it must be either at the end of the line +or followed by a non-word constituent character. Word-constituent +characters are letters, digits, and the underscore. +.TP +.BR \-x ", " \-\^\-line-regexp +Select only those matches that exactly match the whole line. +.TP +.B \-y +Obsolete synonym for +.BR \-i . +.TP +.BR \-Z ", " \-\^\-null +Output a zero byte (the \s-1ASCII\s0 +.B NUL +character) instead of the character that normally follows a file name. +For example, +.B "grep \-lZ" +outputs a zero byte after each file name instead of the usual newline. +This option makes the output unambiguous, even in the presence of file +names containing unusual characters like newlines. This option can be +used with commands like +.BR "find \-print0" , +.BR "perl \-0" , +.BR "sort \-z" , +and +.B "xargs \-0" +to process arbitrary file names, +even those that contain newline characters. +.SH "REGULAR EXPRESSIONS" +.PP +A regular expression is a pattern that describes a set of strings. +Regular expressions are constructed analogously to arithmetic +expressions, by using various operators to combine smaller expressions. +.PP +.B Grep +understands two different versions of regular expression syntax: +\*(lqbasic\*(rq and \*(lqextended.\*(rq In +.RB "\s-1GNU\s0\ " grep , +there is no difference in available functionality using either syntax. +In other implementations, basic regular expressions are less powerful. +The following description applies to extended regular expressions; +differences for basic regular expressions are summarized afterwards. +.PP +The fundamental building blocks are the regular expressions that match +a single character. Most characters, including all letters and digits, +are regular expressions that match themselves. Any metacharacter with +special meaning may be quoted by preceding it with a backslash. +.PP +A list of characters enclosed by +.B [ +and +.B ] +matches any single +character in that list; if the first character of the list +is the caret +.B ^ +then it matches any character +.I not +in the list. +For example, the regular expression +.B [0123456789] +matches any single digit. A range of characters +may be specified by giving the first and last characters, separated +by a hyphen. +Finally, certain named classes of characters are predefined. +Their names are self explanatory, and they are +.BR [:alnum:] , +.BR [:alpha:] , +.BR [:cntrl:] , +.BR [:digit:] , +.BR [:graph:] , +.BR [:lower:] , +.BR [:print:] , +.BR [:punct:] , +.BR [:space:] , +.BR [:upper:] , +and +.BR [:xdigit:]. +For example, +.B [[:alnum:]] +means +.BR [0-9A-Za-z] , +except the latter form depends upon the \s-1POSIX\s0 locale and the +\s-1ASCII\s0 character encoding, whereas the former is independent +of locale and character set. +(Note that the brackets in these class names are part of the symbolic +names, and must be included in addition to the brackets delimiting +the bracket list.) Most metacharacters lose their special meaning +inside lists. To include a literal +.B ] +place it first in the list. Similarly, to include a literal +.B ^ +place it anywhere but first. Finally, to include a literal +.B \- +place it last. +.PP +The period +.B . +matches any single character. +The symbol +.B \ew +is a synonym for +.B [[:alnum:]] +and +.B \eW +is a synonym for +.BR [^[:alnum]] . +.PP +The caret +.B ^ +and the dollar sign +.B $ +are metacharacters that respectively match the empty string at the +beginning and end of a line. +The symbols +.B \e< +and +.B \e> +respectively match the empty string at the beginning and end of a word. +The symbol +.B \eb +matches the empty string at the edge of a word, +and +.B \eB +matches the empty string provided it's +.I not +at the edge of a word. +.PP +A regular expression may be followed by one of several repetition operators: +.PD 0 +.TP +.B ? +The preceding item is optional and matched at most once. +.TP +.B * +The preceding item will be matched zero or more times. +.TP +.B + +The preceding item will be matched one or more times. +.TP +.BI { n } +The preceding item is matched exactly +.I n +times. +.TP +.BI { n ,} +The preceding item is matched +.I n +or more times. +.TP +.BI { n , m } +The preceding item is matched at least +.I n +times, but not more than +.I m +times. +.PD +.PP +Two regular expressions may be concatenated; the resulting +regular expression matches any string formed by concatenating +two substrings that respectively match the concatenated +subexpressions. +.PP +Two regular expressions may be joined by the infix operator +.BR | ; +the resulting regular expression matches any string matching +either subexpression. +.PP +Repetition takes precedence over concatenation, which in turn +takes precedence over alternation. A whole subexpression may be +enclosed in parentheses to override these precedence rules. +.PP +The backreference +.BI \e n\c +\&, where +.I n +is a single digit, matches the substring +previously matched by the +.IR n th +parenthesized subexpression of the regular expression. +.PP +In basic regular expressions the metacharacters +.BR ? , +.BR + , +.BR { , +.BR | , +.BR ( , +and +.BR ) +lose their special meaning; instead use the backslashed +versions +.BR \e? , +.BR \e+ , +.BR \e{ , +.BR \e| , +.BR \e( , +and +.BR \e) . +.PP +Traditional +.B egrep +did not support the +.B { +metacharacter, and some +.B egrep +implementations support +.B \e{ +instead, so portable scripts should avoid +.B { +in +.B egrep +patterns and should use +.B [{] +to match a literal +.BR { . +.PP +\s-1GNU\s0 +.B egrep +attempts to support traditional usage by assuming that +.B { +is not special if it would be the start of an invalid interval +specification. For example, the shell command +.B "egrep '{1'" +searches for the two-character string +.B {1 +instead of reporting a syntax error in the regular expression. +\s-1POSIX.2\s0 allows this behavior as an extension, but portable scripts +should avoid it. +.SH "ENVIRONMENT VARIABLES" +.TP +.B GREP_OPTIONS +This variable specifies default options to be placed in front of any +explicit options. For example, if +.B GREP_OPTIONS +is +.BR "'\-\^\-binary-files=without-match \-\^\-directories=skip'" , +.B grep +behaves as if the two options +.B \-\^\-binary-files=without-match +and +.B \-\^\-directories=skip +had been specified before any explicit options. +Option specifications are separated by whitespace. +A backslash escapes the next character, +so it can be used to specify an option containing whitespace or a backslash. +.TP +\fBLC_ALL\fP, \fBLC_MESSAGES\fP, \fBLANG\fP +These variables specify the +.B LC_MESSAGES +locale, which determines the language that +.B grep +uses for messages. +The locale is determined by the first of these variables that is set. +American English is used if none of these environment variables are set, +or if the message catalog is not installed, or if +.B grep +was not compiled with national language support (\s-1NLS\s0). +.TP +\fBLC_ALL\fP, \fBLC_CTYPE\fP, \fBLANG\fP +These variables specify the +.B LC_CTYPE +locale, which determines the type of characters, e.g., which +characters are whitespace. +The locale is determined by the first of these variables that is set. +The \s-1POSIX\s0 locale is used if none of these environment variables +are set, or if the locale catalog is not installed, or if +.B grep +was not compiled with national language support (\s-1NLS\s0). +.TP +.B POSIXLY_CORRECT +If set, +.B grep +behaves as \s-1POSIX.2\s0 requires; otherwise, +.B grep +behaves more like other \s-1GNU\s0 programs. +\s-1POSIX.2\s0 requires that options that follow file names must be +treated as file names; by default, such options are permuted to the +front of the operand list and are treated as options. +Also, \s-1POSIX.2\s0 requires that unrecognized options be diagnosed as +\*(lqillegal\*(rq, but since they are not really against the law the default +is to diagnose them as \*(lqinvalid\*(rq. +.B POSIXLY_CORRECT +also disables \fB_\fP\fIN\fP\fB_GNU_nonoption_argv_flags_\fP, +described below. +.TP +\fB_\fP\fIN\fP\fB_GNU_nonoption_argv_flags_\fP +(Here +.I N +is +.BR grep 's +numeric process ID.) If the +.IR i th +character of this environment variable's value is +.BR 1 , +do not consider the +.IR i th +operand of +.B grep +to be an option, even if it appears to be one. +A shell can put this variable in the environment for each command it runs, +specifying which operands are the results of file name wildcard +expansion and therefore should not be treated as options. +This behavior is available only with the \s-1GNU\s0 C library, and only +when +.B POSIXLY_CORRECT +is not set. +.SH DIAGNOSTICS +.PP +Normally, exit status is 0 if matches were found, +and 1 if no matches were found. (The +.B \-v +option inverts the sense of the exit status.) +Exit status is 2 if there were syntax errors +in the pattern, inaccessible input files, or +other system errors. +.SH BUGS +.PP +Email bug reports to +.BR bug-gnu-utils@gnu.org . +Be sure to include the word \*(lqgrep\*(rq somewhere in the +\*(lqSubject:\*(rq field. +.PP +Large repetition counts in the +.BI { m , n } +construct may cause grep to use lots of memory. +In addition, +certain other obscure regular expressions require exponential time +and space, and may cause +.B grep +to run out of memory. +.PP +Backreferences are very slow, and may require exponential time. +.\" Work around problems with some troff -man implementations. +.br diff --git a/archivemail.py b/archivemail.py new file mode 100755 index 0000000..bae40a6 --- /dev/null +++ b/archivemail.py @@ -0,0 +1,569 @@ +#!/usr/bin/python -tt +############################################################################ +# Copyright (C) 2002 Paul Rodger +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +############################################################################ + +"""Archive and compress old mail in mbox-format mailboxes""" + +import atexit +import fcntl +import getopt +import mailbox +import os +import re +import rfc822 +import string +import sys +import tempfile +import time + +# globals +VERSION = "archivemail v0.1.0" +COPYRIGHT = """Copyright (C) 2002 Paul Rodger +This is free software; see the source for copying conditions. There is NO +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.""" + +options = None # global instance of the run-time options class +stale = None # list of files to delete on abnormal exit + +############## class definitions ############### + +class Stats: + """collect and print statistics per mailbox""" + archived = 0 + mailbox_name = None + archive_name = None + start_time = 0 + total = 0 + + def __init__(self, mailbox_name, final_archive_name): + """constructor for a new set of statistics - the mailbox names are + only used for printing a friendly message""" + self.start_time = time.time() + self.mailbox_name = mailbox_name + self.archive_name = final_archive_name + options.compressor_extension + + def another_message(self): + self.total = self.total + 1 + + def another_archived(self): + self.archived = self.archived + 1 + + def display(self): + """Display one line of archive statistics for the mailbox""" + end_time = time.time() + time_seconds = end_time - self.start_time + action = "archived" + if options.delete_old_mail: + action = "deleted" + print "%s: %s %d of %d message(s) in %.1f seconds" % \ + (self.mailbox_name, action, self.archived, self.total, + time_seconds) + + +class StaleFiles: + """container for remembering stale files to delete on abnormal exit""" + archive = None # tempfile for messages to be archived + compressed_archive = None # compressed version of the above + procmail_lock = None # original_mailbox.lock + retain = None # tempfile for messages to be retained + + +class Options: + """container for storing and setting our runtime options""" + archive_suffix = "_archive" + compressor = None + compressor_extension = None + days_old_max = 180 + delete_old_mail = 0 + lockfile_attempts = 5 # 5 seconds of waiting + lockfile_extension = ".lock" + quiet = 0 + script_name = os.path.basename(sys.argv[0]) + verbose = 0 + + def parse_args(self, args, usage): + """set our runtime options from the command-line arguments""" + try: + opts, args = getopt.getopt(args, '?IVZd:hqs:vz', + ["bzip2", "compress", "days=", "delete", "gzip", + "help", "quiet", "suffix", "verbose", + "version"]) + except getopt.error, msg: + user_error(msg) + for o, a in opts: + if o == '--delete': + self.delete_old_mail = 1 + if o in ('-d', '--days'): + self.days_old_max = string.atoi(a) + if (self.days_old_max < 1): + user_error("argument to -d must be greater than zero") + if (self.days_old_max >= 10000): + user_error("argument to -d must be less than 10000") + if o in ('-h', '-?', '--help'): + print usage + sys.exit(0) + if o in ('-q', '--quiet'): + self.quiet = 1 + if o in ('-v', '--verbose'): + self.verbose = 1 + if o in ('-s', '--suffix'): + self.archive_suffix = a + if o in ('-V', '--version'): + print VERSION + "\n\n" + COPYRIGHT + sys.exit(0) + if o in ('-z', '--gzip'): + if (self.compressor): + user_error("conflicting compression options") + self.compressor = "gzip" + if o in ('-Z', '--compress'): + if (self.compressor): + user_error("conflicting compression options") + self.compressor = "compress" + if o in ('-I', '--bzip2'): + if (self.compressor): + user_error("conflicting compression options") + self.compressor = "bzip2" + if not self.compressor: + self.compressor = "gzip" + extensions = { + "compress" : ".Z", + "gzip" : ".gz", + "bzip2" : ".bz2", + } + self.compressor_extension = extensions[self.compressor] + return args + + +class Mailbox: + """ generic read/writable 'mbox' format mailbox file""" + count = 0 + file = None + mbox = None + + def __init__(self): + """constructor: doesn't do much""" + pass + + def store(self, msg): + """write one message to the mbox file""" + vprint("saving message to file '%s'" % self.file.name) + assert(msg.unixfrom) + self.file.write(msg.unixfrom) + assert(msg.headers) + self.file.writelines(msg.headers) + self.file.write("\n") + + # The following while loop is about twice as fast in + # practice to 'self.file.writelines(msg.fp.readlines())' + while 1: + body = msg.fp.read(8192) + if not body: + break + self.file.write(body) + self.count = self.count + 1 + + def unlink(self): + """destroy the whole thing""" + if self.file: + file_name = self.file.name + self.close() + vprint("unlinking file '%s'" % self.file.name) + os.unlink(file_name) + + def get_size(self): + """determine file size of this mbox file""" + assert(self.file.name) + return os.path.getsize(self.file.name) + + def close(self): + """close the mbox file""" + if not self.file.closed: + vprint("closing file '%s'" % self.file.name) + self.file.close() + + def read_message(self): + """read one rfc822 message object from the mbox file""" + if not self.mbox: + self.file.seek(0) + self.mbox = mailbox.UnixMailbox(self.file) + assert(self.mbox) + message = self.mbox.next() + return message + + def exclusive_lock(self): + """set an advisory lock on the whole mbox file""" + vprint("obtaining exclusive lock on file '%s'" % self.file.name) + fcntl.flock(self.file, fcntl.LOCK_EX) + + def exclusive_unlock(self): + """unset any advisory lock on the mbox file""" + vprint("dropping exclusive lock on file '%s'" % self.file.name) + fcntl.flock(self.file, fcntl.LOCK_UN) + + def procmail_lock(self): + """create a procmail-style .lock file to prevent clashes""" + lock_name = self.file.name + options.lockfile_extension + attempt = 0 + while os.path.isfile(lock_name): + vprint("lockfile '%s' exists - sleeping..." % lock_name) + time.sleep(1) + attempt = attempt + 1 + if (attempt >= options.lockfile_attempts): + user_error("Giving up waiting for procmail lock '%s'" % lock_name) + vprint("writing lockfile '%s'" % lock_name) + lock = open(lock_name, "w") + stale.procmail_lock = lock_name + lock.close() + + def procmail_unlock(self): + """delete our procmail-style .lock file""" + lock_name = self.file.name + options.lockfile_extension + vprint("removing lockfile '%s'" % lock_name) + os.unlink(lock_name) + stale.procmail_lock = None + + def leave_empty(self): + """This should be the same as 'cp /dev/null mailbox'. + This will leave a zero-length mailbox file so that mail + reading programs don't get upset that the mailbox has been + completely deleted.""" + vprint("turning '%s' into a zero-length file" % self.file.name) + atime = os.path.getatime(self.file.name) + mtime = os.path.getmtime(self.file.name) + blank_file = open(self.file.name, "w") + blank_file.close() + os.utime(self.file.name, (atime, mtime)) # reset to original timestamps + + + +class RetainMailbox(Mailbox): + """a temporary mailbox for holding messages that will be retained in the + original mailbox""" + def __init__(self): + """constructor - create the temporary file""" + temp_name = tempfile.mktemp("archivemail_retain") + self.file = open(temp_name, "w") + stale.retain = temp_name + vprint("opened temporary retain file '%s'" % self.file.name) + + def finalise(self, final_name): + """constructor - create the temporary file""" + self.close() + + atime = os.path.getatime(final_name) + mtime = os.path.getmtime(final_name) + + vprint("renaming '%s' to '%s'" % (self.file.name, final_name)) + os.rename(self.file.name, final_name) + + os.utime(final_name, (atime, mtime)) # reset to original timestamps + stale.retain = None + + def unlink(self): + """Override the base-class version, removing from stalefiles""" + Mailbox.unlink(self) + stale.retain = None + + +class ArchiveMailbox(Mailbox): + """all messages that are too old go here""" + final_name = None # this is + def __init__(self, final_name): + """copy any pre-existing compressed archive to a temp file which we + use as the new soon-to-be compressed archive""" + assert(final_name) + compressor = options.compressor + compressedfilename = final_name + options.compressor_extension + + if os.path.isfile(final_name): + user_error("There is already a file named '%s'!" % (final_name)) + + temp_name = tempfile.mktemp("archivemail_archive") + + if os.path.isfile(compressedfilename): + vprint("file already exists that is named: %s" % compressedfilename) + uncompress = "%s -d -c %s > %s" % (compressor, + compressedfilename, temp_name) + vprint("running uncompressor: %s" % uncompress) + stale.archive = temp_name + system_or_die(uncompress) + + stale.archive = temp_name + self.file = open(temp_name, "a") + self.final_name = final_name + + def finalise(self): + """rename the temp file back to the original compressed archive + file""" + self.close() + compressor = options.compressor + compressed_archive_name = self.file.name + options.compressor_extension + compress = compressor + " " + self.file.name + vprint("running compressor: '%s'" % compress) + + stale.compressed_archive = compressed_archive_name + system_or_die(compress) + stale.archive = None + + compressed_final_name = self.final_name + options.compressor_extension + vprint("renaming '%s' to '%s'" % (compressed_archive_name, + compressed_final_name)) + os.rename(compressed_archive_name, compressed_final_name) + stale.compressed_archive = None + + +class OriginalMailbox(Mailbox): + """This is the mailbox that we read messages from to determine if they are + too old. We will never write to this file directly except at the end + where we override the whole file with the RetainMailbox.""" + file = None + def __init__(self, mailbox_name): + """open the mailbox, ready for reading""" + try: + self.file = open(mailbox_name, "r") + except IOError, msg: + user_error(msg) + + +def main(args = sys.argv[1:]): + global options + global stale + + options = Options() + usage = """Usage: %s [options] mailbox [mailbox...] +Moves old mail messages in mbox-format mailboxes to compressed mailbox +archives. This is useful for saving space and keeping your mailbox manageable. + Options are as follows: + -d, --days= archive messages older than days (default: %d) + -s, --suffix= suffix for archive filename (default: '%s') + -z, --gzip compress the archive using gzip (default) + -I, --bzip2 compress the archive using bzip2 + -Z, --compress compress the archive using compress + --delete delete rather than archive old mail (use with caution!) + -v, --verbose report lots of extra debugging information + -q, --quiet quiet mode - print no statistics (suitable for crontab) + -V, --version display version information + -h, --help display this message +Example: %s linux-devel + This will move all messages older than %s days to a file called + 'linux-devel_archive.gz', deleting them from the original 'linux-devel' + mailbox. If the 'linux-devel_archive.gz' mailbox already exists, the + newly archived messages are appended. +""" % (options.script_name, options.days_old_max, options.archive_suffix, + options.script_name, options.days_old_max) + + check_python_version() + + args = options.parse_args(args, usage) + if len(args) == 0: + print usage + sys.exit(1) + + os.umask(077) # saves setting permissions on mailboxes/tempfiles + stale = StaleFiles() + atexit.register(clean_up) + + for filename in args: + tempfile.tempdir = os.path.dirname(filename) # don't use /var/tmp + final_archive_name = filename + options.archive_suffix + archive_mailbox(mailbox_name = filename, + final_archive_name = final_archive_name) + + + +######## errors and debug ########## + +def vprint(string): + """this saves putting 'if (verbose) print foo' everywhere""" + if options.verbose: + print string + + +def user_error(string): + """fatal error, probably something the user did wrong""" + script_name = options.script_name + message = "%s: %s\n" % (script_name, string) + + sys.stderr.write(message) + sys.exit(1) + +########### operations on a message ############ + +def is_too_old(message): + """return true if a message is too old (and should be archived), + false otherwise""" + date = message.getdate('Date') + delivery_date = message.getdate('Delivery-date') + use_date = None + time_message = None + + if delivery_date: + try: + time_message = time.mktime(delivery_date) + use_date = delivery_date + vprint("using message 'Delivery-date' header") + except ValueError: + pass + if date and not use_date: + try: + time_message = time.mktime(date) + use_date = date + vprint("using message 'Date' header") + except ValueError: + pass + if not use_date: + print message + vprint("no valid dates found for message") + return 0 + + time_now = time.time() + if time_message > time_now: + time_string = time.asctime(use_date) + vprint("warning: message has date in the future: %s !" % time_string) + return 0 + + secs_old_max = (options.days_old_max * 24 * 60 * 60) + days_old = (time_now - time_message) / 24 / 60 / 60 + vprint("message is %.2f days old" % days_old) + + if ((time_message + secs_old_max) < time_now): + return 1 + return 0 + + +############### mailbox operations ############### + +def archive_mailbox(mailbox_name, final_archive_name): + """process and archive the given mailbox name""" + archive = None + retain = None + + vprint("archiving '%s' to '%s' ..." % (mailbox_name, final_archive_name)) + stats = Stats(mailbox_name, final_archive_name) + + original = OriginalMailbox(mailbox_name) + if original.get_size() == 0: + original.close() + vprint("skipping '%s' because it is a zero-length file" % + original.file.name) + if not options.quiet: + stats.display() + return + original.procmail_lock() + original.exclusive_lock() + + msg = original.read_message() + if not msg: + user_error("file '%s' is not in 'mbox' format" % mailbox.file.name) + + while (msg): + stats.another_message() + message_id = msg.get('Message-ID') + vprint("processing message '%s'" % message_id) + if is_too_old(msg): + stats.another_archived() + if options.delete_old_mail: + vprint("decision: delete message") + else: + vprint("decision: archive message") + if (not archive): + archive = ArchiveMailbox(final_archive_name) + archive.store(msg) + else: + vprint("decision: retain message") + if (not retain): + retain = RetainMailbox() + retain.store(msg) + msg = original.read_message() + vprint("finished reading messages") + + original.exclusive_unlock() + original.close() + + if options.delete_old_mail: + # we will never have an archive file + if retain: + retain.finalise(mailbox_name) + else: + original.leave_empty() + elif archive: + archive.finalise() + if retain: + retain.finalise(mailbox_name) + else: + original.leave_empty() + else: + # There was nothing to archive + if retain: + # retain will be the same as original mailbox -- no point copying + retain.close() + retain.unlink() + + original.procmail_unlock() + if not options.quiet: + stats.display() + + +############### misc functions ############### + +def clean_up(): + """This is run on exit to make sure we haven't left any stale + files/lockfiles left on the system""" + vprint("cleaning up ...") + if stale.procmail_lock: + vprint("removing stale procmail lock '%s'" % stale.procmail_lock) + try: os.unlink(stale.procmail_lock) + except (IOError, OSError): pass + if stale.retain: + vprint("removing stale retain file '%s'" % stale.retain) + try: os.unlink(stale.retain) + except (IOError, OSError): pass + if stale.archive: + vprint("removing stale archive file '%s'" % stale.archive) + try: os.unlink(stale.archive) + except (IOError, OSError): pass + if stale.compressed_archive: + vprint("removing stale compressed archive file '%s'" % + stale.compressed_archive) + try: os.unlink(stale.compressed_archive) + except (IOError, OSError): pass + + +def check_python_version(): + """make sure we are running with the right version of python""" + build = sys.version + too_old_error = "requires python v2.0 or greater. Your version is: %s" % build + try: + version = sys.version_info # we might not even have this function! :) + if (version[0] < 2): + UserError(too_old_error) + except: # I should be catching more specific exceptions + UserError(too_old_error) + + +def system_or_die(command): + """Give a user_error() if the command we ran returned a non-zero status""" + rv = os.system(command) + if (rv != 0): + status = os.WEXITSTATUS(rv) + user_error("command '%s' returned status %d" % (command, status)) + + +# this is where it all happens, folks +if __name__ == '__main__': + main()