By Vasant Dhar and Elaine Chang

ne of the most fascinating aspects of the World Wide Web’s open access is that people sound off on virtually every topic of interest. As a consequence, the Web is fast becoming the repository for global information, and an increasing share of information on the Internet is being generated by individuals, rather than organizations or “experts.” How people use and are influenced by this information is an active area of research. We particularly wanted to understand how music sales are affected by user-generated content.

Previous research has found that online consumer reviews can predict book and movie sales, but we don’t know of any prior study that has explored the effects of user-generated content, especially blogs and social networking sites as well as traditional sources of reviews, for predicting online music sales. Our question was whether user-generated content provides any predictive value for music sales, or whether it is largely retrospective, or just plain noise.

We investigated the impact of user-generated content on sales of music CDs – which still account for 85 percent of the music market – by looking at blogs and social networking sites. A blog (short for Web log) is a website that is usually written like a journal, with users’ postings arranged in reverse chronological order. Some surveys estimate that a staggering 30 percent of the US population considers blogs an important source of information. Social networking sites enable people to create profiles and make connections to others who live in the same area, share similar interests, or simply seem interesting. Users create a public list of mutual friends – that is, both users have listed each other as a friend. In assessing the significance of user-generated content, we compared it to more traditional information sources, such as professional reviews in print or electronic media.


Try it, you'll like it!

Music, like books, movies, vacation spots, and even medical and financial advice, is considered an “experience good” – a product whose quality is difficult to observe or sample adequately before purchase. People often rely on others for input in making a decision about whether to buy or utilize such a product. The influence of traditional and user-generated content – whether by professional or amateur reviewers – has been a key area for research in relation to the movie and book industries.

With the ability to sample music on the Web, music has become somewhat less of an experience good. Does that mean that what others say about a new album shouldn’t matter? Perhaps. But blogs could also serve as an “attention directing” mechanism in generating more awareness. In other words, if there is a large volume of blogs about an album, chances are that the album is creating some buzz. Our hypothesis was that blogs matter, because we believe that a lot of effort goes into writing good blogs, and their authors feel passionate enough to spend time writing and sharing their thoughts with others. Readers recognize and pay attention to good blogs. A good reputation helps blogs attract traffic that is, in turn, influenced by their content.

We also hypothesized that social networks matter. In the music industry, the social networking site Myspace has a strong reputation for promoting music artists. The site provides a special music category that allows artists to create profile pages, including band biographies, upcoming tour dates, and streaming music tracks. Through these band profiles, Myspace users can simultaneously promote artists they like to their friends and bookmark the artists’ work. The number of friends displayed on a band’s Myspace page is like a public badge of popularity. We would expect that a band with thousands of friends on Myspace would be more popular with Myspace users than a band with just a handful.

Methodology and Data

Our methodology was to gather data tracing the changes in user-generated content for an album by tracking the volume of blog chatter, the number of friends an artist has on Myspace, and the album reviews for four weeks before and after the release date. We controlled for the influence of external differences in promotion budgets and so on by recording whether an album is released by a major or independent label. We constructed measurable indicators for user-generated content as well as traditional content, such as album reviews from mainstream sources like Rolling Stone, with the intention of understanding their relative significance on music sales. Blog chatter and the extent of social network connectivity were employed as the proxy for user-generated content. We then did modeling to examine the relative significance of the variables in predicting album unit sales two weeks ahead.

“The results show unequivocally that user-generated content as measured by blog chatter matters in subsequent sales for music.”

Our data consisted of album statistics and data collected from publicly available information on websites. We compiled the sample of music albums by collecting the names of albums released in the US between January 16 and March 6, 2007, from Pause & Play (, a website devoted solely to listing upcoming album releases. Old material, such as reissues and compilations, was excluded from the sample.

We focused on physical CD sales, since information on digital music sales is difficult to obtain and downloading still holds a far smaller market share. We computed album sales based on sales ranks, because Amazon is one of the largest online CD retailers and its sales ranks are easily observed. (Nielsen Soundscan would have been the ideal source for album sales data, as it is the industry standard tracking system for sales of music products in the US, but its data are proprietary and very expensive to obtain.) We cross-checked the release date given by Pause & Play with Amazon’s page for the album in order to verify that the record label had not moved the release date, and if the album did not have a corresponding page on Amazon, it was eliminated from the sample. Since Amazon allows consumers to preorder or purchase products far ahead of the actual release date, we were able to separate chatter prior to the product being evaluated from the chatter that follows after release. The final sample consisted of a total of 108 albums.

The Chicken or the Egg?

One might question whether chatter is truly predictive of subsequent sales, or whether increased sales lead to increased chatter which, in turn, leads to increased sales. To test this, we divided the dataset described earlier such that only pre-release chatter was considered and paired with post-release sales.

he results show unequivocally that user-generated content as measured by blog chatter matters in subsequent sales for music. Interestingly, the increase in size of the social network was not significant in the reduced dataset, suggesting that it may have no predictive value before release or that it may only matter after release.

Finally, it is natural to ask whether it is reasonable to conclude that increased blog chatter really causes an increase in sales or whether other unobserved variables might be affecting both blog volume and sales. It is not possible to make such a conclusion based on this study. Perhaps the “quality” of the artist causes both increased blog chatter and sales, where high quality is somehow recognized in the marketplace by some mechanism, which, in turn, has its effect on what we observed. Without a strong prior model that includes such a variable, it is not possible to draw any causal connection. This is important not just theoretically but practically, because it means that it may be futile to engineer an increase in blog posts with the expectation that this will lead to higher sales!


Chatter Matters

Were there any significant interaction effects among the various metrics? We posited that these would be of particular interest to marketing managers interested in sifting through the burgeoning volume of Web 2.0 metrics becoming available on the Internet. Which ones, when considered simultaneously, would provide insights not derivable by looking at them in isolation? It is currently difficult to have well-formulated hypotheses about this. However, it is worthwhile to work bottom-up using inductive pattern-discovery methods to find the interesting interactions that can be tested further in future studies.

We analyzed the data to uncover the significant interaction effects and found that blog chatter was the most important variable. If an album had more than 40 blog posts, it had an above-average level of sales. If an album had more than 40 blog posts and was released by a major label, then it was likely to have very high sales. This was no surprise, as a large number of blog posts indicate a high level of buzz, and being released by a major label means it is more likely that there will be significant promotion of the album through channels other than the Internet. Interestingly, though, if blog chatter was extremely high – above 240 posts – an album was able to overcome the disadvantage of being released by an independent label. In fact, albums with such extreme highs in chatter corresponded to sales even higher than major-label, high-chatter albums. However, even if chatter was relatively high for an independent label (above 40 posts), sales were higher than the average for the sample, but still relatively low if the 240-post level was not breached. An independent label with low blog chatter had very low sales, as expected.

Finally, our results indicated that major label releases with low blog chatter (less than 40 blogs) and low numbers of Myspace friends would have higher sales than major label releases with low blog chatter and high numbers of Myspace friends. This seems counter-intuitive at first, but in the sample, major-label releases without a Myspace page were considered to have zero Myspace friends, which could explain the result. In addition, major label releases that had a Myspace page but few Myspace friends were from artists such as John Mellencamp and Art Garfunkel; we would presume that the majority of their audiences, who are older, do not generally use Myspace.



Chatter does matter. In general, the Internet has a lot of “organic” content, representing the collective feelings and opinions of a lot of people, many of whom are probably well-informed individuals on a wide range of subjects.

Our research shows that the Internet provides consumers with a powerful word-of-mouth channel for information on upcoming music releases. We analyzed the usefulness of blogs and social networks, as well as reviews in consumer, online media, and mainstream media, in predicting album sales in the four weeks before and after the album’s release date. We found that the most significant variable is blog chatter or the volume of blog posts on an album, with higher numbers of posts corresponding to higher sales.

igher-percentage changes in Myspace friends may also be significant, although the results here were not consistent. We found that the average consumer rating is significant, while the number of consumer reviews is not. Our results also showed that average consumer ratings better predict sales than average mainstream media ratings.

Our analysis also showed that traditional factors cannot be ignored. While independent label releases with extremely high blog chatter can sell even more units than major label releases, our findings estimated that the average major label release sold approximately 12 times more than the average independent label release. We also found that the higher the number of mainstream media reviews, the greater the sales.

The results of this study suggest that user-generated content should be considered seriously by record labels. Most notably, since blog chatter and Myspace friend information is available before an album releases and ships, record labels can examine these two variables to predict future sales well in advance of when the album is available in stores.

At the same time, we caution against assumptions of causality for reasons discussed in the last section. If blog posts start becoming manipulated because people think they have an impact on sales, the predictive power might disappear because the underlying reasons for it disappear. There is a crude analogy here to efficiency in financial markets, where predictive models lose their power over time as the relationships become recognized and exploited by people who seek to benefit from them. n

VASANT DHAR is professor of information systems, chairman of the information systems group, and co-director of the Center for Digital Economy Research at NYU Stern, and ELAINE CHANG (BS ’07) recently graduated with a double major in finance and international business from NYU Stern.