This week in ISYS6621, I will be sharing my findings on Data Mining on social media data. From data generation by users sharing information and spending times on their favorite platforms, to how this data is stored and collected to how this data is finally computed upon and turned into insight, I’ve stumbled upon some astounding information recently. It turns out Facebook processes nearly 2.5 million posts per minute while 72 hours of video are uploaded YouTube nowadays.
Data centers storing our senseless photo albums from the Summer of 2007 have military-grade security. Predictive algorithms analyzing the data we’re producing have made leaps and bounds since the term ‘big data’ came about around 2008, but even though computers may understand parts of our lives better than we do, there’s one thing computers have continued to struggle with.
What I found most interesting about all of my research was the topic of recommendation algorithms. Just off the top of my head, I can picture seeing “Suggested Apps…” in my Twitter timeline, “People you may know..” on LinkedIn, “Recommended Games” on Facebook and plenty more. Whether supporting the service’s growth model or advertising platform, recommendations have long been at the backbone of social media platforms. Realistically though, how often are these recommendation actually things we like though? In the case of advertisements, they’re often so obnoxious that even if I were being recommended an app destined to be the next Instagram I wouldn’t care. And with “people you may know,” the results are often so trivial that a good match doesn’t surprise anyone anymore. Oh we have over 100 friends in common? Yea we might know each other. But what about when the software we use is genuinely giving us its best guess at what we might like though?
It’s a paradigm that has to take a multitude factors into account to be accurate, the kind of puzzle that has consistently stumped data scientists. A few years back, Netflix even hosted a $1,000,000 challenge to whoever could build the best recommendation software to keep its users on the couch even longer than they were. Until recently, I was convinced that there never really would be a program capable of knowing what I liked better than I could myself.
When I first saw the “Discover Weekly” playlist pop up on my Spotify my gut feeling was that it was a PR move to take the fresh launch of Apple’s Beats Radio out of the spotlight. Regardless, I tried it and found myself pleasantly surprised. In it’s debut, the 30 song playlist worked as advertised for me. I wound up saving about one third of the playlist’s songs over the course of the week, and the following Monday by the time I woke up I had clean Discover Weekly waiting for me.
Somehow Spotify had done it again, finding bands I had never heard of yet musicI loved. On the third week however, a “known bug” kept me from Monday pickup of a new Discover Weekly playlist. That afternoon I stumbled upon a Buzzfeed article entitled Spotify’s Discover Weekly Updated Late And People Were Furious, and that’s when I knew that Spotify was onto something big.
I clearly wasn’t the only one one relying on Spotify’s computers to tell me what I love and for me at least, this felt like the first major triumph for recommendation algorithms. My hunch was all but confirmed when today I saw that The Verge had published a longform article called Tastemaker: How Spotify’s Discover Weekly cracked human curation at internet scale. The article itself is quite thorough and well worth the read, but I’ve decided to included a few paragraphs from it does a great job of getting at the ‘how’ Discover Weekly has become so successful for so many.
The technology that makes Discover Weekly possible comes in part from a Boston-based startup called The Echo Nest, which Spotify acquired in March of 2014… The company became one of the best in the business and helped power recommendation systems for Rdio, Spotify, Deezer, iHeartRadio, and Rhapsody. But it never had a massive user base of its own that it could leverage to build new tools. “You have really good people, you have some really good algorithms. They’re only [as] good as the data that you have,” says professor Downie. That changed when The Echo Nest became part of Spotify and could tap its 75 million users.
The combination of The Echo Nest technology and Spotify’s massive data trove led to Discover Weekly. Here’s how it works: Spotify has built a taste profile for each user based on what they listen to. It assigns an affinity score to artists, which is the algorithm’s best guess of how central they are to your taste. It also looks at which genres you play the most to decide where you would be willing to explore new music.
The algorithms behind Discover Weekly finds users who have built playlists featuring the songs and artists you love. It then goes through songs that a number of your kindred spirits have added to playlists but you haven’t heard, knowing there is a good chance you might like them, too. Finally, it uses your taste profile to filter those findings by your areas of affinity and exploration.
In a sense, the system works like the original Page Rank, (named for Larry Page), the technique Google used to revolutionize web search. Page Rank crawled the web to find hyperlinks and treated each one as a vote pointing toward useful information. A big batch of links pointing to a website about Elvis indicated to Google that site was a good resource on the The King. In Discover Weekly, each time a user with similar taste playlists a certain song, it’s a vote that the song will sound good to you when paired with other tracks on that playlist.
I may not have the credentials of Spotify’s Discover Weekly computers, but I highly recommend giving this playlist a chance.