Srdjan Todorovic wrote:
Makes me wonder, ... how would one go about unit testing this given that it's a bash script?
For automated unit testing you would just write a script that populate a test tree (much like I did there), exec the script, then compare the left-over tree (just run a "find") with the expected one (a previously vetted "known good" version).
<tangent> Then expand the test tree to have more complexities: multilevel directories, different old/young timestamps, empty directories, directories and files with names starting with dashes/spaces/dots/asteriskses/backslashes/highbits, symlinks in the tree, symlinks outside the tree, to files and directories etc. Rather than sticking all those cases in a single test tree you might prefer to split the testcases up, to keep the setup/verification stages easier to manage, and run them as a suite from yet another script.
If producers/users are modifying the tree while you're running your script, then you may need to be even more careful, and that's harder to test in an automated fashion; you might need some mock filesystem.
And of course, put it under revision control, add a license, add comments, write documentation, note dependencies (python versions) and tested platforms, have it code reviewed. Etc etc. :-)
But first you really need to figure out what you're actually trying to achieve for this particular use case. Do you want to move these files to a review area rather than just deleting? Do you want to use some filename pattern matching to distinguish dspam logs/mailboxes and treat them differently? Does the depth of the node in the tree have some significant semantics? If you encounter unusual files, do you want to process them, or just abort so that the sysadmin will investigate? Can you perhaps change the producer to deposit its data in a different way (like year/month/day subdirectories) that are easier to dispose of? Is this just a small number of local files, or some massive distributed filesystem? Is this maintenance a one-off, or will this run regularly? Is the data mission critical? Are there audit/retention policies or backup management implications?
You can make this as complex as you choose. I just wanted to give Mark some quick inspiration before lunch :-) </tangent>
-- Martijn