How to use R to remove audiobooks from your Spotify liked songs

Looking for the code? You can find it in this GitLab Repo!

I’m not a good Spotify citizen: I can listen to the same Spotify-curated “This is..” artist playlist for weeks and I rarely venture out to discover new music. And because I’m too lazy to curate and maintain my own playlists, I often lose track of songs / artists I enjoyed listening to at some point.

So I was over the moon when I realized a couple of weeks ago that there was a “Liked Songs” playlist1 that I could “fill” by simply “hearting” a song. And even better, the playlist already contained over 1400 songs that I apparently had added … somehow.

I instantly added some more songs and started listening to the playlist while coding at the CorrelAid website. It was working - until my flow was interrupted by a narrator voice reading something… What?? I opened my Spotify app and unliked the “song”. But it happened again and again - apparently three whole audiobooks - each with > 40 tracks - had found their way into my “Liked Songs”.

I tried to solve the problem using the app: I liked and unliked the audiobooks but nothing worked - the “songs” did not disappear from my “Liked Songs” playlist. So of course, instead of “unliking” >200 songs by hand in the app, I decided to use my programming skills and the Spotify Web API to (semi-)automate the problem away.

First, of course, I loaded some packages. As always with R, there’s already an excellent API wrapper package for the Spotify Web API, the spotifyr 📦.

# spotify api package
library(spotifyr)
# usual suspects
library(dplyr)
library(purrr)
library(readr)
# for my own custom function to remove songs from playlist
library(httr)
library(usethis)
library(glue)

Get “Liked Songs”

First, I used the spotifyr 📦 to get the liked songs playlist from Spotify. To do so, I followed the instructions from the GitHub README to create an app and obtain the client id and client secret. I stored them in a local .Renviron file:

SPOTIFY_CLIENT_ID="myclientid"
SPOTIFY_CLIENT_SECRET="myclientsecret"

I load the contents from the .Renviron file with the baseR function readRenviron and use spotifyr to obtain the access token:

readRenviron(".Renviron")
access_token <- spotifyr::get_spotify_access_token()

API Limits & Pagination

The access token is the “key” to interact with the Spotify API, so I was good to go. The spotifyr package thankfully offers a function for almost every endpoint of the Spotify API, so spotifyr::get_my_saved_tracks exists.

Unfortunately, most endpoints do not return all items at once when called, but only up to a certain limit. In the case of spotifyr::get_my_saved_tracks, the API can only return a maximum of 50 tracks in response to a call. To work around this limit, I made use of the offset parameter. The offset tells the API “where to start” with returning the next 50 items. From the Spotify documentation:

limit: Optional. The maximum number of objects to return. Default: 20. Minimum: 1. Maximum: 50.

offset: Optional. The index of the first object to return. Default: 0 (i.e., the first object). Use with limit to get the next set of objects.

(source)

So instead of making one “big” call to get all saved tracks, I needed to make several smaller calls, while increasing the offset until I had “reached” the total number of tracks.

Conceptually:

  • call 1: offset = 0, limit = 50 -> gets tracks 1-50
  • call 2: offset = 50, limit = 50 -> gets tracks 51-100
  • call 3: offset = 100, limit = 50 -> gets tracks 101 - 150
  • continue until the offset is larger than the total number of liked tracks

In the context of APIs, such a pattern is called pagination.

To implement the pagination I needed to know the total number of tracks in the “liked songs” playlist. I could get it from the API by making a call to the me/tracks endpoint using the get_my_saved_tracks function with the include_meta_info parameter set to TRUE. This returned a total number of saved tracks of 1516.

# get total number of saved tracks and calculate the offsets (can only get 50 tracks with a call)
meta <- spotifyr::get_my_saved_tracks(limit = 50, offset = 0, include_meta_info = TRUE)
total <- meta$total # total number of saved tracks
total # 1516

Now, I could’ve implemented the conceptual pagination pattern using a while or until loop - after all the “continue until” bullet point totally reads like it implies a while / until loop. However, I decided to use a functional programming solution instead. Why? While while (haha!) loops are totally a-okay, writing functions forces me think more about my code. You can read more about this in Advanced R.2

Functional approach

For my “functional approach” to work, I needed to calculate the vector of offsets ahead of time. In order to do so, I made use of the seq function which “generate[s] regular sequences”.

offsets <- seq(0, total + 50, 50)
offsets

Because I couldn’t run the API call above for knitting this blog post (I’ve already deleted the relevant tracks), here’s a mockup with the total number of tracks hardcoded:

# for the blog post
total_fake <- 1516
offsets_fake <- seq(0, total_fake + 50, 50)
offsets_fake
##  [1]    0   50  100  150  200  250  300  350  400  450  500  550  600  650  700
## [16]  750  800  850  900  950 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450
## [31] 1500 1550

Adding 50 to the total number of tracks is required because otherwise, the sequence would stop at 1500.

I then defined a simple wrapper function that takes an offset as a parameter and feeds it to the get_my_saved_tracks function from the spotifyr package. I also added some simple logging. I map the function to my offsets vector using map_dfr. This function does two things:

  • takes each offset from the offsets vector and feeds it to my get_chunk function, essentially working like a for loop and implementing our “conceptual” pagination from above.
  • If I had used a simple map the return value would’ve been a list of data frames. In contrast to map, map_dfr binds all the data frames together into one big data frame.
# define function to get 50 saved tracks depending on offset 
get_chunk <- function(offset) {
  new <- spotifyr::get_my_saved_tracks(limit = 50, offset = offset, include_meta_info = FALSE)
  usethis::ui_done(glue::glue("got from offset: {offset}"))
  return(new)
}

# map over offsets, bind to dataframe 
all_tracks <- purrr::map_dfr(offsets, get_chunk) 

Finally, I wrote the data to disk to make sure that I could use them for this blog post 😉.

# write to disk
readr::write_rds(all_tracks, "saved_tracks.rds")

Find the audiobooks 🔎

Here is a glimpse at the data:

all_tracks <- readr::read_rds("saved_tracks.rds")
glimpse(all_tracks)
## Observations: 1,516
## Variables: 30
## $ added_at                           <chr> "2020-08-11T14:00:47Z", "2020-08-1…
## $ track.artists                      <list> [<data.frame[2 x 6]>, <data.frame…
## $ track.available_markets            <list> [<"AD", "AE", "AL", "AR", "AT", "…
## $ track.disc_number                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ track.duration_ms                  <int> 296213, 186101, 178666, 187931, 23…
## $ track.explicit                     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE,…
## $ track.href                         <chr> "https://api.spotify.com/v1/tracks…
## $ track.id                           <chr> "6L5iRhYgVPaEFqmGaVxWrN", "3HZcjYk…
## $ track.is_local                     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE,…
## $ track.name                         <chr> "Хочу перемен", "Cliff's Edge - La…
## $ track.popularity                   <int> 41, 26, 0, 83, 73, 21, 21, 0, 16, …
## $ track.preview_url                  <chr> "https://p.scdn.co/mp3-preview/858…
## $ track.track_number                 <int> 1, 3, 9, 1, 1, 17, 3, 18, 7, 10, 8…
## $ track.type                         <chr> "track", "track", "track", "track"…
## $ track.uri                          <chr> "spotify:track:6L5iRhYgVPaEFqmGaVx…
## $ track.album.album_type             <chr> "album", "single", "album", "singl…
## $ track.album.artists                <list> [<data.frame[2 x 6]>, <data.frame…
## $ track.album.available_markets      <list> [<"AD", "AE", "AL", "AR", "AT", "…
## $ track.album.href                   <chr> "https://api.spotify.com/v1/albums…
## $ track.album.id                     <chr> "7trila5XMOsUUkcujWqzcn", "4Rqfc8w…
## $ track.album.images                 <list> [<data.frame[3 x 3]>, <data.frame…
## $ track.album.name                   <chr> "Виктор Цой 55 (Выпуск в честь 55-…
## $ track.album.release_date           <chr> "2017-06-21", "2016-03-25", "1990-…
## $ track.album.release_date_precision <chr> "day", "day", "day", "day", "day",…
## $ track.album.total_tracks           <int> 55, 5, 27, 1, 2, 23, 5, 24, 20, 10…
## $ track.album.type                   <chr> "album", "album", "album", "album"…
## $ track.album.uri                    <chr> "spotify:album:7trila5XMOsUUkcujWq…
## $ track.album.external_urls.spotify  <chr> "https://open.spotify.com/album/7t…
## $ track.external_ids.isrc            <chr> "FR59R1744876", "USAT21600938", "N…
## $ track.external_urls.spotify        <chr> "https://open.spotify.com/track/6L…

track.type or track.album.type seemed like an obvious choice to find out which tracks belonged to an audiobook. However:

print(table(all_tracks$track.type))
## 
## track 
##  1516
table(all_tracks$track.album.type)
## 
## album 
##  1516

Unfortunately, it seems like the data did not offer a indicator for whether the track belonged to an audiobook or not - in Spotify’s eyes, there’s no difference between music albums and audiobooks.

Hence, it was time for a good old heuristic: I decided to look at the album lenghts because usually, audiobooks are quite long compared to “normal” music albums. Tidyverse to the rescue:

# group by album, sort by duration 
all_tracks_by_album <- all_tracks %>% 
  group_by(track.album.id, track.album.name) %>%  # group by album id and album name (only the id would be necessary, but i wanted to keep both)
  summarize(total_duration_album =  sum(track.duration_ms)) %>% # sum up all the tracks
  arrange(desc(total_duration_album)) # sort descending 

# determine what are the audiobooks by looking at the data
# the longest albums should be the audiobooks 
knitr::kable(head(all_tracks_by_album, 10)) 
track.album.id track.album.name total_duration_album
2Hso705hbz70g2ywUyBSXK Über uns der Himmel, unter uns das Meer (Gekürzte Lesung) 29832918
6DBCctTaza5w2rWrkK1I1D Inferno 25746930
00fshMmQEnqmP8Gja8aEe4 Das Joshua-Profil 23468946
1kLscSc6HEAonyvwbZO3XK Love Actually 11707091
1716XPsNUeHok477AtTRhX Best of Classical - Die 50 größten Werke der Klassik 11127462
7xl50xr9NDkd3i2kBbzsNZ Stadium Arcadium 9103257
3CBMpoI2vZlKXs3wgnNWGn 20 The Greatest Hits 8786053
4jytUDY4LPrvwkReW4S2gE Greatest Hits 1992-2010 Es asì 8052950
2OXv5X4J2y9CQ7eVSNEHad Greatest Hits 1992-2010 E da qui 8040528
3dVI5svXoD3X3HR2Y4P1qt Projekt Seerosenteich (Live - Deluxe Version) 7171975

I instantly recognized the first three entries as the annoying audiobooks that had kept popping up in my “Liked songs” playlist. 🎉

Remove the audiobooks from the Liked Songs

To remove the tracks from the audiobooks, I needed all their IDs. First, I extracted the album ids from the audiobooks:

# select the audiobooks / the n longest albums and extract the ids
audiobooks_id <- all_tracks_by_album %>% 
  head(3) %>% # from the manual investigation, i had three audiobooks
  pull(track.album.id)
audiobooks_id
## [1] "2Hso705hbz70g2ywUyBSXK" "6DBCctTaza5w2rWrkK1I1D" "00fshMmQEnqmP8Gja8aEe4"

Then, I filtered the original all_tracks data frame for those albums to get all the track IDs that I wanted to delete:

# filter tracks belonging to audiobooks and extract the ids we need to delete
to_delete <- all_tracks %>% 
  filter(track.album.id %in% audiobooks_id)
to_delete_ids <- to_delete$track.id # extract the ids
length(to_delete_ids) 
## [1] 327

Now, the only thing left was the actual deletion of the tracks from my “Liked Songs” playlist. Unfortunately, this is not in the scope of spotifyr so I had a look at the relevant API docs, took inspiration from spotifyr source code (for the authentication part) and implemented a small function quick-and-dirty style - without any error handling or retry mechanisms 🙈 :

# define function to delete ids (limited to 50 at a time) -> not part of spotifyr
# cf https://developer.spotify.com/documentation/web-api/reference/library/remove-tracks-user/
delete_ids <- function(ids) {
  httr::DELETE("https://api.spotify.com/v1/me/tracks", config = httr::config(token = spotifyr::get_spotify_authorization_code()),
             query = list(ids=paste0(ids, collapse = ",")))
}

Because this endpoint was limited as well, I had to use some dark stackoverflow magic to split the to_delete_ids vector with 327 track ids into 7 chunks of size 50:

# can only delete 50 at once, so split
del_groups <- split(to_delete_ids, ceiling(seq_along(to_delete_ids) / 50 )) # from https://stackoverflow.com/a/3321659 
str(del_groups)
## List of 7
##  $ 1: chr [1:50] "0r8CnP1ri7Op1K6pYBAIIS" "04cWxUJpQNmQzPx3oerRIe" "0TcwSjGcRLP0qANZ0pn5S2" "0K0UOpucV0mUMEVpvVioqI" ...
##  $ 2: chr [1:50] "4999R4NWDhX4dxHuexgRQk" "1b5t5yfZL0gtw7kBO37Cag" "3E46vLZaOoiGVGwLEnfqae" "44qSTlrLcvZpZL5bipSU6g" ...
##  $ 3: chr [1:50] "5dcDqtSNzKmi7X6leDTGji" "7HTLLCS0GuEFt6mZksBNPK" "7ddOMjwAggaPDArUtwjbgz" "5DANy9Hla7MaWtJQpdVPVI" ...
##  $ 4: chr [1:50] "2vcGDeYkUJej4R7hUkUgYd" "4SqR4H9THJNTB0JQMcipwy" "2O6MlAfSk6I070p0zvV7qr" "2kM5gjsLeaeHZFTlDsYqBC" ...
##  $ 5: chr [1:50] "0qWX4kYBRQYr0HjDAIgIHh" "0Nvzj7ma1dDrxREoYv5cpb" "0XaoLn8FXHGb1fhytPAtcl" "0mpjzZA3jjzHOvvIewzixs" ...
##  $ 6: chr [1:50] "4IiEy7SnLy8jVwaxNHExsU" "1kcPZTzNfOp4vCa4fvWHJa" "1ZikjFqoNWhPjRrJBwyBPU" "2sQoA08e4hezq0rbpRaFqf" ...
##  $ 7: chr [1:27] "4gwPvHVH7Rvz6nZ52CTio5" "5Mmn5wr79RVIlXjSR43Tep" "6E6gHBhv9t8wvdssiTzzmb" "5hSjrtbCWIWeLFmkLwNfOf" ...

Finally, I used map to apply my function to the chunks:

# apply function
del_groups %>% 
  purrr::map(delete_ids)

Thankfully, it worked out of the box, those annoying audiobook tracks were gone from my “Liked Songs” playlist and I was a happy coder again: 🙏

via GIPHY

What’s next?

I definitely want to “optimize” my Liked Songs playlist even further. For example, there are a lot of complete albums in it which are artifacts from liking whole albums instead of individual songs. Ideally, I would like to have access to the stats on how often I listened to each song so that I could just pick out the songs that made me like the albums but it seems like there is no way to access those stats because of GDPR. So I might end up building a sort of interactive CLI with usethis which allows me to quickly accept or reject songs from the playlist. Maybe I can even integrate this with the player so that I could listen to the song for some seconds before making my decision.

Or…I might become a lazy Spotify citizen again and drop this whole project :shrug: 🙈

In either case…until next time: Keep coding ❤️

The code

The complete code is here. I adapted it slightly for this blog post but it should (hopefully) work.


  1. Someone please give an “Introduction to Spotify” course…I need it.↩︎

  2. I have a blog post draft about this topic…I hope I get around to publishing it at some point!↩︎