Archive for April, 2011

S04E20: The Herb Garden Germination

April 5, 2011

Tonight is our first ever guest post.   It is by my close friend Kristina Lerman.  Kristina and I met the first week in freshman year, where we were both physics majors.  We spent many years together working on problem sets — which is how physicists like to spend their twenties.   After getting her Ph.D. from University of California, Santa Barbara in physics, Kristina became an expert in the mathematics of  networks, especially online networks, long before “social networking” became a buzz word. So when there was a line in tonight’s script on meme theory by Amy Farrah Fowler, I immediately called Kristina for help.  Now she’s been kind enough to explain to us the science behind tonight’s episode.  So without further ado…

Tonight guest blogger: Prof. Kristina Lerman


(By Kristina Lerman)

AMY:  Meme theory suggests items of gossip are like living organisms that seek to reproduce using humans as their hosts.

In this episode, Sheldon and Amy discover that memes, or items of gossip and other information, are like infectious organisms that reproduce themselves using humans as hosts. They engage in a bit of “memetic epidemiology” as they conduct social experiments on their friends to test the theory that tantalizing pieces of gossip make stronger, more virulent memes that spread faster and farther among their friends than mundane pieces of information.

The idea that information moves through a social group like an infectious disease has itself proved to be a powerful meme.   This analogy has informed sociologists’ attempts to understand many diverse phenomena, including adoption of innovations,  the spread of fads and fashion, word-of-mouth recommendations, and social media campaigns. The analogy becomes even stronger when social interactions are encoded within a friendship graph, the so-called social network.  In a social epidemic each informed, or “infected,” individual infects her network neighbors with some probability given by the transmissibility, which measures how contagious the infection is.  Understanding social epidemics is crucial to identifying influential people,  predicting how far epidemics will spread, and identifying methods to enhance or impede its progress. Advertisers and social media consultants have been busy devising “viral” marketing strategies. Much like an epidemiologist might advise people on ways to reduce the transmissibility of a virus (wash hands), or if that fails, figure out who should be vaccinated to limit its spread (kindergarden teachers in many cases), marketing types are interested in identifying individuals who will generate the greatest buzz if they receive free products and other incentives.

Though theoretical progress has been brisk, until recently, empirical studies of epidemics were limited to taking case histories of sick people and attempting to trace their contacts. The advent of  social media has changed that.  People are joining social media sites such Twitter, Digg, Flickr, and YouTube  to find interesting content and connect with friends and like-minded people through online social networks. Traces of human activity that are exposed by the sites have given scientists treasure troves of data about  individual and group behavior. This data has given social science an empirical grounding that many physicists find irresistible. As a result, physicists (author included)  have flooded the field, much to the chagrin of practicing social scientists. In the culture wars of science,  physicists often come off as arrogant, like Sheldon, but that is the price of being right.

The detailed data about human behavior on social media sites has allowed us to quantitatively study dynamics of social epidemics. In my own work I study how information spreads on Digg and Twitter. These sites allow users to add friends to their social network whose activities they want to follow.   A user becomes infected by voting for (digging) or tweeting a story and exposes her network neighbors to it. Each neighbor may in turn become infected (i.e., vote or retweet),  exposing her own neighbors to it, and so on. This way interest in a story cascades through the network. This data enables us to trace the flow of information along social links. We found that social epidemics look and spread very differently from diseases on networks.  Contrary to our expectations, the vast majority of information cascades grew slowly and failed to reach“epidemic” proportions. In fact, on Digg, these cascades reached fewer than 1% of users.

There are a number of factors that could explain this observation.   Perhaps users modulate transmissibility of stories to be within a narrow range of threshold to prevent information overload. Perhaps the structure of the network (e.g., clustering or communities) limits the spread of information.  Or it could be that the mechanism of social contagion, in other words, how people decide to vote for a story once their friends voted for it, prevents interest in stories from growing.   We examined these hypotheses through simulations of epidemic processes on networks and empirical study of real information cascades.

We found that while network structure somewhat limits the growth of cascades, a far more dramatic effect comes from the social contagion mechanism. Unlike the standard models of disease spread used in previous works on epidemics, repeated exposure to the same story does not make the user more likely to vote for it. We defined an alternative contagion mechanism that fits empirical observations and showed that it reproduces the observed properties of real information cascades on Digg.

(Longer version:  Specifically, we simulated the independent cascade model that is widely used to study epidemics on networks.   Each simulated cascade began with a single seed node who voted for a story. By analogy with epidemic processes, we call this node infected. The susceptible followers of the seed node decide to vote on the story with some probability given by the transmissibility, λ (lambda). Every node can vote for the story once, so at this point the seed node is removed, and we repeat the process with the newly infected nodes. A node who is following n voting nodes has n independent chances to decide to vote. Intuitively, this assumption implies that you are more likely to become infected if many of your friends are infected. )

Cascade size as a function of transmissibility λ (lambda) for simulated cascades on the Digg graph and the randomized graph with the same degree distribution. Heterogeneous mean field predicts cascade size as a fraction of the nodes affected. The line (hmf) reports these predictions multiplied by the total number of nodes in the Digg network.

After some time, no new nodes are infected, and the cascade stops. The final number of infected nodes gives cascade size. These are shown in the figure above, where each point represents a single cascade with the y-axis giving the final cascade size and the x-axis giving the transmissibility, λ.   Blue dots represent cascades on the original Digg graph, while pink dots represent cascades on a randomized version of the Digg graph, and gold line gives theoretical predictions. In both simulations, there exists a critical value of λ, the epidemic threshold, below which cascades quickly die out and above which they spread to a significant fraction of the graph.

Comparing the theoretical and simulation results to real cascades presents a puzzle. Why are cascades so small? According to our cascade model, only transmissibilities in a very narrow range near the threshold produce cascades of the appropriate size of ~500 votes. Clearly, the structure is not enough to explain the difference. To delve deeper, we looked at the contagion mechanism itself. We measured the probability that a Digg user votes for a story given than n of his friends have voted. We found that independent cascade model grossly overestimates the probability of a vote even with 2 or 3 voting friends. In fact, we found that multiple exposures to a story only marginally increase the probability of voting for it.

Cascade size vs inferred transmissibility for simulated and real cascades on the Digg graph. HMF prediction of cascade size is shown for reference.

After simulating information cascades using the new contagion mechanism, we found that their size is an order of magnitude smaller than before, as shown in the figure above. The size of the real Digg cascades is similar to the simulated cascades, giving us confidence that we have uncovered the mechanism that limits the spread of information. These findings underscore the fundamental difference between the spread information and disease: despite multiple opportunities for infection within a social group, people are less likely to become spreaders of information with repeated exposure.

S04E19: The Zarnecki Incursion

April 2, 2011

(***SPOILER ALERT****  If you have not seen the episode yet, you may not want to read this post, which includes a minor spoiler.)

In this latest episode, the boys know how an internal combustion engine works.  Let’s learn how it works, and maybe it will be as useful someday to you as it was for them.

Not only biologists do dissections.  When I was in college, we dissected an internal combustion engine.  Not only was it easily as educational as slicing up a frog, but also it had the advantage of not smelling of formaldehyde and not feeling really bad for a frog.

Just like dissecting a frog, dissecting a model airplane engine is a terrific way to learn about how the engine (instead of frog) works.

But first let’s dissect the phrase itself: “internal combustion engine”.

A motor any machine that converts stored energy into useful mechanical motion, or as a physicist would say, work.   Even a simple rower with an oar is converting his recent meal into motion of a boat and is a motor.   But typically if the device starts with heat energy, as opposed to electricity or other stored power, we specifically call the motor an engine.

Another word for burning is combustion.  That provides the heat for our engine.  A burning log releases heat energy. But fossil fuels such as gasoline and natural gas are able to produce more heat per gram through combustion than nearly any other substance.  The only exception is hydrogen, producing three times more energy per gram through combustion than methane or gasoline.    Given that the cost of lifting jet fuel is a major expense for flying an airplane, I don’t know why airplanes don’t use hydrogen fuel.

Typically the mechanical work is first performed by a fluid, such as steam or a hot gas.  When the fluid that is heated is separate from what provides the  heat, it is called an external combustion engine.  For example, in a steam engine, wood or some material is burned, which in turn heats the steam which is pressed into service.  But in an internal combustion engine the same fluid that was burned does the work.   The simplicity leads to an economy of parts and efficiency.

To make a long story short, if you put a small amount of explosive fuel and air in a small volume and ignite it, a large amount of energy is released as expanding gas. If you are clever enough to do work with this gas, you have built an engine.

Such is the role of the piston.   When the gas explodes, it pushes the piston and does work.   But that’s not the whole story of the piston.  The piston and little ports called valves perform a simple dance that performs all the functions of an internal combustion engine.   The engine we dissected in college was called a two-stroke engine because it performed all its work in just two steps.  But far more common, and used in automobiles, is the four-stroke engine which is even easier to understand.

A piston for a Chevy engine. The piston converts the energy of the exploding gas into mechanical energy.

The four steps of the dance are:

Stroke 1:  (the “Intake stroke“) The piston pulls back just as the valve opens to a source of fuel and air, usually already  mixed just right.   The pulling back of the piston  fills the cylinder with explosive gas through the hole left by the open valve.

Stroke 2: (the “Compression Stroke“) The valve closes and the piston moves forward.  This compresses the gas, but more importantly puts the piston in position to be moved outward by the upcoming explosion.

Stroke 3:  (the “Power Stroke“) The fuel/air mixture is ignited with a spark and the piston is pushed outward with an enormous force.  This is the point in the cycle that produces useful mechanical work. In a car the moving piston turns a shaft called the crank shaft so that the motion motion of the piston quickly becomes rotational energy.  The car itself works with energies stored as rotations, eventually turning the wheels.  The wheels turn against the road, and the force of friction between the tire and the road pushes the car forward (or backward if your transmission is in a reverse gear.)

Stroke 4: (the “Exhaust Stroke“)  A different valve opens so the burned gases can be expelled. This is the exhaust.

Notice that the piston goes in and out of the cylinder twice, while only producing work once.

The process repeats itself thousands of times per minute.  Typically each piston is in a different part of the cycle so that the piston doing the work (expansion stroke) can move the other pistons to do their job on each stroke.  The crankshaft turns while the pistons go in and out.  Such motion of the pistons is called reciprocating, and often this kind of engine is called a reciprocating engine.

It works fine once it is going, but getting it started is the trick.  Anyone that has turned on the ignition many times on a winter morning knows how hard this can be.  Or you might be faced with pulling the ignition cord on a lawnmower.

Any leaks around the piston are bad news.  It will cause a loss of compression, perhaps causing Leonard’s problem.  Meanwhile oil around the crankshaft can leak into the combustion cylinder and burn, producing smoke and loss of oil.

If Leonard’s problem was his car lost too much oil, the damage to the engine means we may not be seeing him drive it ever again.

%d bloggers like this: