How Companies Collect Your Data

In 2012, the New York Times published an article detailing how Target collected data on its shoppers to determine if they were pregnant. By examining a woman’s shopping patterns, Target could determine if they were expecting and then send them promotional material and coupons. Target became a pioneer in advanced data mining and personal marketing. Back before the age of the Internet, marketers advertised to the general public. Since the introduction of social media, online shopping and GPS, marketing is now personal.


Data Exhaust

The problem many companies face is not that they have too little of your data; it’s that they have too much. This is know as data exhaust, and it’s the reason companies hire scores of data analysts and marketers to analyze this metadata. In 2010, we as a species were creating more data per day than we had created from the beginning of time until 2003. In 2015, 76 exabytes of data will travel across the Internet in just one year.

Companies collect this data using various methods (which I’ll talk about shortly), and begin to build profiles around each consumer. For every consumer who uses a credit card, downloads an app, or sends a text, there is some corporate record of that transaction. You might be log #688330 in Target’s database or customer #0023837 within Uber’s system. As time passes, companies can begin to build big data sets. 

Big data sets derive value, in part, from the inferences that can be made from them. Some of these can be obvious. If you have someone’s detailed location data over the course of a year, you can infer what their favorite stores are. If you have a list of calls and emails, you can infer who the user’s friends are. Some companies, like in the case of Target, can discover more subtle facts about you. Ethnicity, religion, drinking history, sexual orientation: these are all inferences companies can draw from your transactions and actions.

It may seem quite surprising that a team of data analysts could draw so many inferences from your online data, but there is an even more surprising number of ways companies can collect this data.


Location, Location, Location 

In recent years, collecting location-based data has been a proverbial gold rush for corporations. Practically everyone and their mother has a smartphone with GPS location tracking. Location based apps have given us a multitude of benefits, from online delivery to ride-sharing services. However, many companies aren’t exactly transparent about when they’re collecting your location data.

In 2012, the free flashlight app Brightest Flashlight Free came under scrutiny when it was found out they were collecting location data from their 50 million users and selling it to third parties. Brightest Flashlight had snuck a clause into their agreement contract that almost all of its users failed to read. The conflict was resolved when the US Federal Trade Commission got involved, but it showed consumers that even something as simple as a flashlight app might be collecting information on its users.

Other apps and services have taken similar sneaky approaches to collect your data. In 2013, Jay-Z and Samsung teamed up to offer people who downloaded an app the ability to listen to Jay’s new album early. However, the app required the ability to view all accounts on the phone, track the phone’s location and track who the user was talking to.  Amazon quietly and constantly collects location data on you via Kindle. The Angry Birds app even collects location data when the app is off.

With this location data, marketers can use a technique called “geofencing” to identify people who are near a particular business so as to deliver an ad to them. A single geofencing company, Placecast, delivers location-based ads to ten million phones for retailers like Kmart, Starbucks, and Subway. Microsoft also does the same thing to people passing within ten miles of its stores.

Some retailers take it a step further and actually track your physical location within their stores. Using a combination of Bluetooth IDs, MAC addresses and security cameras, retailers can view which isles you walk down and which displays you stop at. The goal of this monitoring is to combine the patterns of hundreds of shoppers into a heat map, which can reveal the effectiveness of certain displays and isles.


Clicks and Quizzes

Location-based data collection is only one facet of how companies can collect your information. Another tool data-miners utilize is the “Cookie.” Cookies save a small record of each website you’ve visited and link you’ve click on within the web browser you’re using. For a standard fee, companies can set their own cookies on pages belonging to other sites. This is known as the “Third-Party Cookie.” Have you ever wondered why you keep seeing the same ad over and over again, no matter what website you’re on? This is a third-party cookie tracking your behavior across multiple sites. Companies like Rubicon Project and Double Click operate to help advertisers target individual users across multiple sites.

retargeting christmas2.png

I went to once and now this ad is stalking me


One of the easiest ways for companies to collect data is to simply have you give it to them. Believe it or not, BuzzFeed has saved almost every answer and response to a majority of its quizzes. BuzzFeed can then take this free information and sell it to third parties, with the quiz taker unaware. Most of the information garnered from these quizzes is useless (i.e. Which Ousted Arab Spring Ruler Are You?), yet some of them can provide valuable information on an individual’s financial status and identity. In 2014 BuzzFeed published a quiz titled “How Privileged Are You?,” containing questions regarding job security, sexual orientation, race and a multitude of other characteristics. Other websites like WebMD and Trivago collect health and travel information on their users. This is the kind of information marketers work so hard to collect, analyze and utilize; websites like BuzzFeed have just made their job a lot easier.




*Much of the information used in this blog comes from Bruce Schneier’s 2015 Data and Goliath.


  1. polmankevin · ·

    Great post. The massive amount of data that people are generating and companies are consuming is fascinating. However, it is also problematic. Many companies feel that big data is the answer to all of their business problems, but there is a difference between collecting data and hoarding it. Hoarding data can make it that much more difficult to find business insights. Its like finding a needle in a hay stack, except the hay stack never stops growing. There is obviously so much useful data out there. The key is to be able to organize it effectively and find the needle in the hay stack. That is why companies are hiring so many analyst, because there is a difference between having data and being able to use data.

  2. adamsmea89 · ·

    This was a great post! I knew companies have figured out ways to track a lot of your purchaes / locations but I did not know the extent of it. The fact that they track customers inside their stores to see which displays work / don’t work was really surprising to me. I was also surprised to hear that websites like BuzzFeed can store and sell your answers.It seems like there really is no way arond having everything you do tracked if you use credit cards and have a smartphone. Other than the fact that adds can be annoying to see over and over again, I wonder if there are negative implications to this. For example, could WebMD ever sell that type of data to a company that could use it against you in some way? It seems like anything is possible with that much data being collected.

  3. Really insightful post! Data mining is really pretty scary. I’m sure all of us with a Facebook account have been noticing very targeted ads about items we seek on other websites, even on other computers. Recently I looked up a page for an upcoming race on my work Google browser. I didn’t register, but just opened the page once. Later that day, I saw an ad for that race on my personal laptop in Chrome. Similarly, I researched a prescription and later saw an ad for prescription delivery services on Facebook. It’s crazy! It makes me nervous about how personal information, especially medical, can be spread across different channels, completely disrupting our expected standards of privacy. I know first hand how companies rely on cookies to help gather audience characteristics and create targeted marketing campaigns, but I’m concerned about how this personal data can be misused — especially when most people don’t even know that they are being “watched.”

  4. It’s crazy how much these companies have about their consumers now! Target specifically sending certain coupons to certain people based on their activity was particularly interesting. Uber is another really good example. It has gotten my routine down to a tee. It will suggest certain locations to me when I open the app sometimes before I even type anything but only at the particular times on specific days that I go to these destinations. Really good post

  5. I love that you opened with that example. I’ve heard it in LITERALLY EVERY analytics class that I’ve ever taken at BC.

    It’s terrifying to think about the terabytes of data that companies have on us. Do you think it’s ethical for companies to be profiling us and trying to draw many of these conclusions about our preferences? Just looking at the outrage with Target knowing about pregnancy and Facebook guessing (often not well…) people’s political preferences, I’m sure that there would be a firestorm if many of these other categorizations came out. Something tells me that behind these screens they might not be thinking about us in ways that they would if they thought we would find out…

    It still astound me that people get up in arms about data privacy with major breaches and illegal usage…but are completely ignorant to all of this common information about the ways companies are legally using our data in creepy ways.

    Solid post!

  6. Aditya Murali · ·

    Wonderful post! The lengths that people/companies go to collect our data is downright scary. The thought that at least one company (including Angry Birds apparently) is tracking my location at any given time is extremely troubling. This truly enforces my feeling that we are never alone. I realized that I am fine when my data is collected, in whatever shape or form, as long as I am just a statistic in a big pool of data. What I’m not okay with is data being collected to build a profile on me as a unique individual. Unfortunately, I know that the present and future of data collection will always be about the individual, so that basically sucks for me.

  7. It feels strange to think that to most companies, I am a record in a relational database with a primary key of my name or email and foreign keys and tables linking me to my location, buying preferences, politics, social connections, etc …..

    I like having ads targeted for me. I really like this. Saves me time and puts ideas, products, and services in front of me that I might benefit from, even when I had never heard of it before.

    I don’t like knowing my life is documented and I have no privacy, especially when we’ve seen so many company’s databases get hacked by criminals or others who will take advantage.

%d bloggers like this: