Automate the boring stuff of your amateur photographer life

Recently, I went on holidays to the Vosges mountains in northeastern France. While one or two days were definitely too rainy to take electronics outside, I was able to take some pics with my Micro-Four-Thirds (MFT) camera of the beautiful autumn landscape, of our family dog (Team #rdogs!) and of the many, many fly agarics.

Back home and with a free weekend all to myself, I ventured to sort the photos and sent the best ones to my family + friends who were with me on the trip. This is always my least favorite part because I take a lot of pictures and a lot of them are…well…not worthy the time of looking at.

So I opened the photo viewer on my Linux laptop, went through the photos and deleted the ones I don’t like. “Done”, you’d think. Well, no. Why? Because, some months ago, I decided I really needed to have RAW files - just in case I’d ever want to seriously edit something (spoiler: I’m too lazy for that). Soo, whenever I push the shutter button nowadays, two files with the same name are stored on my SD card: a normal JPG file and a RAW file with the RW2 extension. So, for example P1120006.JPG and P1120006.RW2.

However, the Linux photo viewer only shows me the JPG files. So after an hour of deleting JPGs, I still needed to delete the corresponding RW2 files of the JPGs I had deleted. And my dislike for doing stuff in the explorer / finder was big enough that I decided to automate this. Because the offending files are already deleted I set up a little test case for this post but I’ll include some screenshots that will show how much time - and nerves - I saved from this little R exercise.

Step 1: Get the data

First up is actually getting the file paths. For this, I use the good old list.files command which will give you all files in a given folder. I get both the simple path and the full path to the file.1

# delete RAW files where the jpg is deleted
library(dplyr)
library(stringr)
library(tidyr)
library(tibble)
library(here)

# FOLDER <- "/home/frie/Pictures/2019/2019-10_vogesen/"
FOLDER <- paste0(here::here(), "/static/media/data/2019-10-19-automate-the-boring-stuff")
full_paths <- list.files(FOLDER, full.names = TRUE)
file_names <- list.files(FOLDER)

df <- tibble::tibble(full_path = full_paths, file_name = file_names)

df %>% select(-full_path)
## # A tibble: 8 x 1
##   file_name
##   <chr>
## 1 P1120001.RW2
## 2 P1120002.JPG
## 3 P1120002.RW2
## 4 P1120003.JPG
## 5 P1120003.RW2
## 6 P1120006.JPG
## 7 P1120006.RW2
## 8 P1120008.RW2

There are 8 files in the folder. By manually looking at the data, I can easily see that I want to delete P1120001.RW2 and P1120008.RW2.

In the real case, there were 942 😱. No way to easily see that at one glance!

Step 2: Determine which files need to be deleted

Fortunately, the RW2 and JPG version have the same file name, except for the extension. I first extract this “common” element of the file name using tidyr::separate which splits a character vector at a certain pattern (the sep argument) and directly puts the splitted things into new columns (hard to explain 😄, just see the result and compare with before!). This is honestly one of my favorite functions ever because it’s such a common task that would be otherwise really annoying. 2

df <- df %>%
tidyr::separate(file_name, into = c("file_name_without_ext", "ext"), sep = "\\.")
df
## # A tibble: 8 x 3
##   full_path                                        file_name_without… ext
##   <chr>                                            <chr>              <chr>
## 1 /Users/frie/Documents/plus1/blog/static/media/d… P1120001           RW2
## 2 /Users/frie/Documents/plus1/blog/static/media/d… P1120002           JPG
## 3 /Users/frie/Documents/plus1/blog/static/media/d… P1120002           RW2
## 4 /Users/frie/Documents/plus1/blog/static/media/d… P1120003           JPG
## 5 /Users/frie/Documents/plus1/blog/static/media/d… P1120003           RW2
## 6 /Users/frie/Documents/plus1/blog/static/media/d… P1120006           JPG
## 7 /Users/frie/Documents/plus1/blog/static/media/d… P1120006           RW2
## 8 /Users/frie/Documents/plus1/blog/static/media/d… P1120008           RW2

Now I count how many files exist for each file_name_without_ext by grouping by that variable and counting the number of rows using the little magic n() function from dplyr. This is such a common pattern and I love that dplyr makes this so easy - I remember doing this for my Bachelor thesis without the tidyverse and it was soo difficult for me.

# could be replaced by shorthand: dplyr::add_count(file_name_without_ext)
df <- df %>%
dplyr::group_by(file_name_without_ext) %>%
dplyr::mutate(n = n())
df
## # A tibble: 8 x 4
## # Groups:   file_name_without_ext [5]
##   full_path                                  file_name_without… ext       n
##   <chr>                                      <chr>              <chr> <int>
## 1 /Users/frie/Documents/plus1/blog/static/m… P1120001           RW2       1
## 2 /Users/frie/Documents/plus1/blog/static/m… P1120002           JPG       2
## 3 /Users/frie/Documents/plus1/blog/static/m… P1120002           RW2       2
## 4 /Users/frie/Documents/plus1/blog/static/m… P1120003           JPG       2
## 5 /Users/frie/Documents/plus1/blog/static/m… P1120003           RW2       2
## 6 /Users/frie/Documents/plus1/blog/static/m… P1120006           JPG       2
## 7 /Users/frie/Documents/plus1/blog/static/m… P1120006           RW2       2
## 8 /Users/frie/Documents/plus1/blog/static/m… P1120008           RW2       1

Now I filter those rows where n == 1 - those are the RW2 files that are the leftover companions of the JPGs I deleted manually. Just to be sure, I also add the ext == "RW2" condition to the filter statement.3

delete_df <- df %>%
dplyr::filter(n == 1 & ext == "RW2")

nrow(delete_df) # only 2 files left
## [1] 2

Step 3: delete, delete, delete!

I use dplyr::pull to get the full_path variable from the data frame.4 I also add a small check that I indeed have only RW2 files - all this making sure thing is getting a bit out of hand but better safe than sorry. 😉

And finally: delete, delete, delete that sh*t with file.remove!

delete_paths <- delete_df %>%
dplyr::pull(full_path)

print(delete_paths)
## [1] "/Users/frie/Documents/plus1/blog/static/media/data/2019-10-19-automate-the-boring-stuff/P1120001.RW2"
## [2] "/Users/frie/Documents/plus1/blog/static/media/data/2019-10-19-automate-the-boring-stuff/P1120008.RW2"
# some quick check
# don't delete JPG
stopifnot(all(stringr::str_ends(delete_paths, "RW2")))
stopifnot(length(delete_paths) == 2)

# delete!
# i commented this out to make it easier to reproduce this.
# file.remove(delete_paths) 

This deletes the two files that do not have a JPG companion. In the real use, my script successfully deleted 258 files as can be seen by comparing the before (posted at the beginning of this post) and after screenshots of my explorer.

Hurray for the power of computers! 🎉

The end

I don’t know whether this brought any considerable insight to anyone. 😄 After all, this is not the usual use case for R - a well written shell command would’ve achieved the same. Or… actually manually deleting the files… But no, this was never an alternative.

Take away from this? Being able to program makes you lazy - or rather it gives you the ability to be lazy by just automating everything away. 😎 👅 And in my opinion, this is just another excellent reason to: keep coding. ❤️

1. The double call could be avoided by splitting the full path using something like tidyr::separate but I was lazy.

2. Sidenote: There’s also tidyr::separate_rows which is even more awesome!

3. If I did my manual deletion process how I described it, this should not be necessary as a JPG should always have a “partner” RAW file. But who knows? 🤷

4. pull is just like \$ - it just integrates better into pipe workflows. As I broke up the pipe for “educational” purposes, it does not really make sense here but I thought I left it in just in case someone did not know about it yet.