Spotify Music Library Scraped by Pirate Activist Group

tfm@europe.pub · 1 month ago

Spotify Music Library Scraped by Pirate Activist Group

Nilz@sopuli.xyz · 1 month ago

Download all existing literature to build a library for preservation and you’re called a pirate. Download all existing literature from aforementioned library to train an LLM and you’re a tech innovator. What a strange world we live in.

P03 Locke@lemmy.dbzer0.com · edit-2 1 month ago

Download all existing literature to build a library for preservation and you’re called a pirate.

Said library contains petabytes of the exact text of each and every piece of literature.

Download all existing literature from aforementioned library to train an LLM and you’re a tech innovator.

Said model contains gigabytes of a bunch of weights that can never go back to the exact words of the book.

What a strange world we live in.

It’s not strange at all. It’s degrees of compression. You compress a JPEG to the point that it’s unrecognizable, and it’s no longer breaking copyright. It’s essentially like trying to write a book you just read based on memory.

hexagonwin@lemmy.sdf.org · 1 month ago

so you’re saying degrading quality while getting filthy rich by stealing everyone else’s work is better than archival efforts? not sure what your point is.

Nilz@sopuli.xyz · 1 month ago

His point is basically that if you remove every 5th word of a book it’s legal to hoard as it’s compressed.

Evil_Incarnate@sopuli.xyz · 8 days ago

It is but it be very hard to

Hideakikarate@sh.itjust.works · edit-2 1 month ago

However, these existing efforts have some major issues:

Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.

Later…

We primarily used Spotify’s “popularity” metric to prioritize tracks. View the top 10,000 most popular songs in this HTML file (13.8MB gzipped).

I must be kinda stupid, but it sounds to me like there’s some double speak. “Only popular music gets preserved, so we preserved music by popularity”

Lojcs@piefed.social · 1 month ago

To be fair, the 10k is just a sample. The true amount is 86 million, about a quarter of all Spotify songs.

Put another way, for any random song a person listens to, there is a 99.6% likelihood that it is part of the archive. We expect this number to be higher if you filter to only human-created songs. Do remember though that the error bar on listens for popularity 0 is large.

For popularity=0, we ordered tracks by a secondary importance metric based on artist followers and album popularity, and fetched in descending order.

We have stopped here due to the long tail end with diminishing returns (700TB+ additional storage for minor benefit), as well as the bad quality of songs with popularity=0 (many AI generated, hard to filter).

Also it sounds like they had difficulty scraping some of the less popular songs and got them from somewhere else.

Kaul@lemmy.dbzer0.com · edit-2 1 month ago

It’d probably be more beneficial to read the article directly from Anna’s Archive where they display plenty of graphs and infographics to make the data understandable. Unfortunately this article has none of that. The “over-focus on popular artists” is quite literally meaning they’re only missing artists who aren’t being listened to, most of which are probably AI anyway.

https://annas-archive.li/blog/backing-up-spotify.html

katy ✨@piefed.blahaj.zone · 1 month ago

want the link just so i can know how to avoid it i’m a good girl who does’t steal totally.

tfm@europe.pub · 1 month ago

https://annas-archive.org/blog/backing-up-spotify.html

baka@lemmy.blahaj.zone · 1 month ago

You forgot to administer head pats

hurtn@lemmy.dbzer0.com · 1 month ago

trying to locate individual tracks in massive torrent files of presumably 10,000’s of tracks each sounds horrible, Meta data and tracks and located in different areas. Audio is reencoded to OGG Opus.

For this to be useful for me I would have to spend about $6000 on hard drives (20/terabyte X 300 TB), than convert the files to MP3, and somehow rename the files to their original songs and artists and create appropriate directories.

Do not think this is practical.

https://annas-archive.li/blog/backing-up-spotify.html

fonix232@fedia.io · 1 month ago

Or stop being an idiot and consider using self-hosted media solutions that handle the metadata for you. Like Plex, Jellyfin, or any of the roughly three dozen options here.

The right torrent client will also allow you to pick and choose which files to download, and you could even go a step further and add a new source provider to e.g. Lidarr that would handle these torrent files and pick out the music you want.

Result?

no need to transcode to MP3 (not sure why you’d want to do that anyway when OPUS files can be played by practically any modern device)
no need to manually do any namings
no need to manually get metadata
no need to get 300TB storage

Hell if you really wanted to, you could even vibe code a solution that includes a torrent client, these music torrents, and a web interface + API that provides all the necessary info for existing clients to be essentially used as a quasi Spotify alternative, only downloading music you actually listen to.