Organizing Photos by Date on the Linux Command Line

phototreeMy family has collected over sixteen years worth of digital photos, videos and so on at this point, and I needed a good way to get them organized. After trying digikam, shotwell, f-spot and others, I realized that what I really wanted was to just sort out everything on a timeline, by date, in a folder on my network storage server itself, without involving a GUI if possible.

That, perhaps inevitably, led me to exiftool: an example of a programmer gone wild in a frenzy of feature fixation. This thing does not follow UNIX doctrine and do one thing and do it well, instead it does many things, but did do the thing I needed very well. Behold, this one-liner of wonder:

exiftool -i SYMLINKS -v -o . \
  "-filename<filemodifydate" "-filename<createdate" "-filename<datetimeoriginal" \
  -d "${DESTINATION}/%%le/%Y/%m/%d/%Y_%m_%d_%H-%M-%S_%%c_%%f.%%le" \
  -r "${SOURCE}" -i SYMLINKS

Glorious, is it not? So what does it do exactly? The first line asks it to be verbose (-v) and (via -o .) to copy the files from the source directory rather than actually move them as is the default. The next line directs exiftool to attempt to use the most accurate of three possible dates, in order of worst to best, read in reverse priority by the tool.

The formatting option (-d)  is where the real fun happens. I chose to use the full path for ${DESTINATION} (this is a variable, replace it with the path to your own destination first), and the options which are non-obvious to a linux user include the “%%le” which is two parts, the “l” says to lowercase the thing after it, and the “e” after it is the file’s extension. The year/month/day and hour/minute/second are fairly obvious, the ‘%%c’ tells it if another file will have the exact same name to increment a counter (this lets you swiftly weed out dupes later) and the ‘%%f’ is the original filename. This should be kept in the new filename if you want to preserve descriptive names yet preserve the timeline. Finally it lowercases the extension because I prefer that for later uses of find.

The last option (-r) directs the tool to the base folder of your mess-o-photos where it will begin its crawl. I recommend ensuring that the folder is on a mount mounted ‘nodev’ because I realized that there was a “dev” folder still present from some of my full-cellphone dumps in there, and that you avoid sending it into a loop by using the don’t follow symlinks option ( -i SYMLINKS).

Eliminating Duplicates and Junk

Exiftool will merrily sort all sorts of files that you might not expect, from pdfs to obscure image types, and it ignores that which it doesn’t understand. So, using the format string above will give you an easy way to dispose of videos or photos you don’t want in one rm -rf. However, there will still be duplicates of photos in your other folders.

If your sole concern is reducing disk space used by dupes, you could use fslint to hardlink all identical files in one swell foop, or if you’re bold, delete the duplicates. I preferred to take a more hands-on approach here, though fslint or similar would potentially do better.

Here’s an example one-liner which will show you all duplicates (evidenced by their counter increment).

for f in $(find "${DESTINATION}" -regex '.*??\-??\-??_[0-9]_.*'); \
 do G=$(echo "$f" | sed "s/_[0-9]_/__/"); \
 echo -n $(md5sum "$G"); echo "$G"; \
 echo -n $(md5sum "$f"); echo "$f"; \

Replace $(DESTINATION} with the path you want to search, ie “jpg’ would do nicely. If you have many files, direct the output to a file that you can browse and sort through.

This isn’t perfect: it may miss more than 10 duplicates (though you can go have a look if you see that many), and if some files were already named with a timestamp format of hh-mm-ss followed by an underscore and a single number, they might be accidentally considered a duplicate.

Print Friendly, PDF & Email
This entry was posted in Geekery, Linux and tagged , , , , . Bookmark the permalink.

Comments are closed.