Tonight is our first ever guest post. It is by my close friend Kristina Lerman. Kristina and I met the first week in freshman year, where we were both physics majors. We spent many years together working on problem sets — which is how physicists like to spend their twenties. After getting her Ph.D. from University of California, Santa Barbara in physics, Kristina became an expert in the mathematics of networks, especially online networks, long before “social networking” became a buzz word. So when there was a line in tonight’s script on meme theory by Amy Farrah Fowler, I immediately called Kristina for help. Now she’s been kind enough to explain to us the science behind tonight’s episode. So without further ado…
(By Kristina Lerman)
AMY: Meme theory suggests items of gossip are like living organisms that seek to reproduce using humans as their hosts.
In this episode, Sheldon and Amy discover that memes, or items of gossip and other information, are like infectious organisms that reproduce themselves using humans as hosts. They engage in a bit of “memetic epidemiology” as they conduct social experiments on their friends to test the theory that tantalizing pieces of gossip make stronger, more virulent memes that spread faster and farther among their friends than mundane pieces of information.
The idea that information moves through a social group like an infectious disease has itself proved to be a powerful meme. This analogy has informed sociologists’ attempts to understand many diverse phenomena, including adoption of innovations, the spread of fads and fashion, word-of-mouth recommendations, and social media campaigns. The analogy becomes even stronger when social interactions are encoded within a friendship graph, the so-called social network. In a social epidemic each informed, or “infected,” individual infects her network neighbors with some probability given by the transmissibility, which measures how contagious the infection is. Understanding social epidemics is crucial to identifying influential people, predicting how far epidemics will spread, and identifying methods to enhance or impede its progress. Advertisers and social media consultants have been busy devising “viral” marketing strategies. Much like an epidemiologist might advise people on ways to reduce the transmissibility of a virus (wash hands), or if that fails, figure out who should be vaccinated to limit its spread (kindergarden teachers in many cases), marketing types are interested in identifying individuals who will generate the greatest buzz if they receive free products and other incentives.
Though theoretical progress has been brisk, until recently, empirical studies of epidemics were limited to taking case histories of sick people and attempting to trace their contacts. The advent of social media has changed that. People are joining social media sites such Twitter, Digg, Flickr, and YouTube to find interesting content and connect with friends and like-minded people through online social networks. Traces of human activity that are exposed by the sites have given scientists treasure troves of data about individual and group behavior. This data has given social science an empirical grounding that many physicists find irresistible. As a result, physicists (author included) have flooded the field, much to the chagrin of practicing social scientists. In the culture wars of science, physicists often come off as arrogant, like Sheldon, but that is the price of being right.
The detailed data about human behavior on social media sites has allowed us to quantitatively study dynamics of social epidemics. In my own work I study how information spreads on Digg and Twitter. These sites allow users to add friends to their social network whose activities they want to follow. A user becomes infected by voting for (digging) or tweeting a story and exposes her network neighbors to it. Each neighbor may in turn become infected (i.e., vote or retweet), exposing her own neighbors to it, and so on. This way interest in a story cascades through the network. This data enables us to trace the flow of information along social links. We found that social epidemics look and spread very differently from diseases on networks. Contrary to our expectations, the vast majority of information cascades grew slowly and failed to reach“epidemic” proportions. In fact, on Digg, these cascades reached fewer than 1% of users.
There are a number of factors that could explain this observation. Perhaps users modulate transmissibility of stories to be within a narrow range of threshold to prevent information overload. Perhaps the structure of the network (e.g., clustering or communities) limits the spread of information. Or it could be that the mechanism of social contagion, in other words, how people decide to vote for a story once their friends voted for it, prevents interest in stories from growing. We examined these hypotheses through simulations of epidemic processes on networks and empirical study of real information cascades.
We found that while network structure somewhat limits the growth of cascades, a far more dramatic effect comes from the social contagion mechanism. Unlike the standard models of disease spread used in previous works on epidemics, repeated exposure to the same story does not make the user more likely to vote for it. We defined an alternative contagion mechanism that fits empirical observations and showed that it reproduces the observed properties of real information cascades on Digg.
(Longer version: Specifically, we simulated the independent cascade model that is widely used to study epidemics on networks. Each simulated cascade began with a single seed node who voted for a story. By analogy with epidemic processes, we call this node infected. The susceptible followers of the seed node decide to vote on the story with some probability given by the transmissibility, λ (lambda). Every node can vote for the story once, so at this point the seed node is removed, and we repeat the process with the newly infected nodes. A node who is following n voting nodes has n independent chances to decide to vote. Intuitively, this assumption implies that you are more likely to become infected if many of your friends are infected. )
After some time, no new nodes are infected, and the cascade stops. The final number of infected nodes gives cascade size. These are shown in the figure above, where each point represents a single cascade with the y-axis giving the final cascade size and the x-axis giving the transmissibility, λ. Blue dots represent cascades on the original Digg graph, while pink dots represent cascades on a randomized version of the Digg graph, and gold line gives theoretical predictions. In both simulations, there exists a critical value of λ, the epidemic threshold, below which cascades quickly die out and above which they spread to a significant fraction of the graph.
Comparing the theoretical and simulation results to real cascades presents a puzzle. Why are cascades so small? According to our cascade model, only transmissibilities in a very narrow range near the threshold produce cascades of the appropriate size of ~500 votes. Clearly, the structure is not enough to explain the difference. To delve deeper, we looked at the contagion mechanism itself. We measured the probability that a Digg user votes for a story given than n of his friends have voted. We found that independent cascade model grossly overestimates the probability of a vote even with 2 or 3 voting friends. In fact, we found that multiple exposures to a story only marginally increase the probability of voting for it.
After simulating information cascades using the new contagion mechanism, we found that their size is an order of magnitude smaller than before, as shown in the figure above. The size of the real Digg cascades is similar to the simulated cascades, giving us confidence that we have uncovered the mechanism that limits the spread of information. These findings underscore the fundamental difference between the spread information and disease: despite multiple opportunities for infection within a social group, people are less likely to become spreaders of information with repeated exposure.