The life of a minor minor prophet, not the rock


Monday, February 17, 2003
Weblogs and power laws

Many systems and phenomena are distributed according to a power law distribution. A power law applies to a system when large is rare and small is common. The distribution of individual wealth is a good example of this: there are a very few rich men and lots & lots of poor folks. A familiar way to think about power laws is the 80/20 rule: 80% of the wealth is controlled by 20% of the population.

It's been shown that the distribution of links on the web scales according to a power law, so it comes as no surprise that the distribution of links to weblogs does as well. Taking the top 100 most linked to weblogs on Technorati as a data set (specifically from 1/24/03), I used Excel to plot and fit a curve to the data:

weblogs obeying the mighty power law

....

This NEC study reveals that the deviation of a set of data from the power law correlates to how much competition is present in the system. The better the fit, the more competitive the environment is. Again, no surprise that the system of weblogs is a highly competitive one.

But what are weblogs competing for? Matt Webb posits that power laws arise due to scarcity.... The scarcity of people's time results in the distribution of links that can be described using power laws.

The idea is that instead of using a quadratic or cubic equation that kinda fits the data, you use a power law equation generated by the data itself to exactly fit the data (or nearly so). The power law equation I derived using the limited sample of the top 100 list is:

y = 5989.8x^(-0.8309)

where y is the # of inbound blogs and x is the rank of the site. I plotted the top 100 data again and tried to fit three curves to it:

fitting three curves to the technorati top 100 data

The dotted blue line is a linear equation, the dashed red line is a quadratic equation, and the solid black line is the aforementioned power law equation. As you can see, the linear and quadratic equations fit the data poorly. The R-squared for the linear equation is 0.31, 0.55 for the quadratic, and 0.99 for the power law equation. So the quadratic is an improvement over the linear equation, but neither compare to the excellent fit of the power law and the excellent results that would follow from using it for Technorati's interesting recent blogs lists.

[Kottke.org]


11:58:17 PM    
Breaking the (power) law

I thought about the problem that this presented to a traditional link engine.  When you rank bloggers simply by the number of people who link to them, you get a very static list of "a-list" bloggers, as shown by the Technorati Top 100.  What I wanted to do was to break that power law, and give more exposure to the lesser known, but still interesting bloggers, especially on days when they stand out and do something interesting.

Siry explains how he inverted the power law to flatten the curve

Basically, the idea is that for a relatively obscure blogger who has, say, 40 people currently linking to his blog, getting 4 or 5 new blogs linking to him can have the same effect as a a-list blogger getting 40 or 50 new links.

This is interesting research for me, but the most satisfying thing about it is that I've found a way to identify interesting new writers and add them to my blogroll.

[Sifry's Alerts]


11:46:58 PM    

a detailed analysis of Power laws as applied to Weblogs, Newspapers and Movies:

http://homepage.mac.com/kevinmarks/powerlaws.html

The conclusions he comes to are:

1. Weblog links do follow a power law
2. This saturates less quickly than other media, due to low barriers to entry
3. Therefore the many lightly linked weblogs outnumber the few heavily linked ones

[Kevin Marks]


11:26:34 PM    

Power Less

... Investigating the "blogosphere" is interesting, but not nearly as interesting as recognizing that there are blogospheres. And these spheres are neither wholly seperate nor wholly integrated with the rest of the network. What constitutes a border for such a sphere? What does this differentiation mean to the structure of the whole.

[Alex.Halavais.net]

The author, Alexander Halavais, works at School of Infomatics at U. Buffalo.  Either we're brining him in for a student speakers talk or I'm gonna go visit him. Or both.


11:12:16 PM    

Splashpower

SplashPad™
This is a universal wireless charging platform which delivers power to mobile devices. The SplashPad is a portable flat surface less than 6mm thick powered from any electric outlet. Put as many devices as you can fit on it and charge up in a truly intuitive fashion. It can even be built into cars, desks and airplane tables!


11:02:26 PM    

So I am reading Steven Johnson's book Emergence - The connected lives of ants, brains, cities, and software trying to prepare for a 8000 word article I have to write for Illume on the future of information. I've been thinking about just this issue for the last month. I think that trying to connect the discussion about emergence with this issue is key to understanding how blogs are different.

...  Although the search engines and metaindexes are useful, they are no longer the first place you go. I read my RSS news feeds before I go searching on a portal for news. As Dave says, don't know most of the blogs on the top 100 list and I don't care. We are organized into more intelligent communities and although there is a power law of sorts with respect to blogs that get a lot of attention, there are many local peaks. I think it looks much more like clusters of blogs with interconnections between communities. A lot like a strength of weak ties sort of map.

...

Technorati top 100 ranking is not as important to me as WHO is linking to me. When I was running Infoseek, all I cared about was HOW MANY pages views we were getting. Sure I brag about my page views to people who don't blog because that's a metric they understand, but the really interesting stuff is going on at a higher level I think.

So... How do we capture the next higher level of order? Well, that's what I hoped you might have an answer for. I think there are ways to look at this subjectively.

One way might be to track a meme through blogspace. See how an idea like your article gets picked up, quoted and where it ends up. Map that and you have one space. Each meme is like a tracer. Some communities will pick up certain ideas, while other will not. You can find the weak ties between to communities as these memes make their way across networks. MANY memes will end up being very local, and SOME will end up on EVERY blog. But I don't know how to do this.

I also want to read his paper on the connection between weblogs, power laws, and democracy.

[Joi Ito]


10:35:51 PM    

The richness of human languages is a fine-tuned compromise between the needs of speakers and of listeners, explain Ramon Ferrer i Cancho and Ricard Solé of the Universitat Pompeu Fabra in Barcelona. Just a slight imbalance of these demands prevents the exchange of complex information, they argue.

... Human languages, say the duo, seem to sit right on this sudden change. When it happens, the frequency of word usages develops a distinctive mathematical form, called a power law. The power law disappears on either side of the communication jump.

It has been known since the 1940s that human languages do indeed show just this kind of statistical distribution of word usage - the social scientist George Kingsley Zipf spotted the power-law behaviour. But it has never been satisfactorily explained before, although Zipf himself speculated that it might represent some kind of "principle of least effort".

[nature.com]


10:08:41 PM    
Google and webtrails

GOOGLE ARE BUILDING THE MEMEX.

They've got one-to-one connections. Links. Now they've realised - like
Ted Nelson - that the fundamental unit of the web isn't the link, but
the trail. And the only place that's online is... weblogs.

There are two levels to the trail:

1 - what you see
2 - what you do
("And what you feel on another track" -- what song is that?)

And the trail is, in its simplest form, organised chronologically.
Later it gets more complex. Look to see Google introduce categories
based on DMOZ as a next step.

So, the GOOGLE TOOLBAR tracks everything you do on the web, giving
you low-level anonymous trails tying the web together. These are
analagous to the strings of physics, or the rows and columns of Excel.
This is 1, what you see.

Now there's the semantics, the meaning extracted from these, and that's
done with the human mind. This is 2, what you do. What you choose to
elevate. Now these trails are the basic units.
[interconnected.org]


 What Vannevar Bush, Ted Nelson, weblogs, and now Google are all demonstrating is that the boundaries between organizations and disciplines are arbitrary. It's the connections and the trails that matter.

[McGee's Musings]

Reading the meta-analysis by James McGee of the Pyra/Google deal, Memex, and webtrails makes my head spin.  It feels so right I can taste it.

Here's an example webtrain created by Mark Pilgrim for this page, showing how he found the Vannevar Bush essay As We May Think, published in July, 1945 in the Atlantic.

Trail: Google Weblog (via RSS) -> Dan Gillmor -> Matt Webb -> search for “memex” -> The Atlantic.

  Now I understand what Dave meant by "the Googlish way to do Directories".  He just folded up a webtrail and let you pivot across points of view at any step.  This is a more powerful metaphore than a linear list, and it's not surprising Dave thought of it given his background in outliners, but I think the simple list will be enough for most people.


9:27:31 PM    
Repost Google research

Google hasn't done a deep search of my site since I moved from radio.userland.com to my own server.  As a result some excellent posts from the past are, for all practical purposes, invisible.

I wanted to re-post a link to some research I did in November on Google, it's competitors, and strategy.

more thoughts from interconnected.org on why Google bought Pyra.


9:05:33 PM    

Simple Reasoning on Google/Blogger. Kevin Lynch has managed to say in one paragraph what I've struggled to explain to people: The two main ways I find information on the internet today is by searching and reading blogs--blogs are another view into the world's information... [Jeremy Zawodny's blog]
8:52:21 PM    

Google and Blogger.

Wow. You leave your news aggregator for a couple of days and Google goes out and buys Pyra, the company that created Blogger. Pretty big news, but I have to admit that my first thought is that this is a mistake. As much as I love Blogger, I don't think Google needed to do this.

Of course, the advantage to blogging this story late is that I can read and comment on others' opinions.

Anil Dash: "More to the point, Google's consistent marketing message so far has been, 'We do search, and we don't want to be a portal'. ...the reality is that it puts Google into a far different role than they've had so far."
-- I agree with Anil, and I'm worried that this is a sign that Google is branching out into an area that isn't integral to that mission.

Nick Denton: "Expect to see, first of all, that Blogger-powered sites show up in Google search results minutes after the posts are published, rather than days."
-- Actually, this wouldn't impress me at all. If Google can't evolve to do this for sites outside of its domain, then it will lose its edge. We're getting to the point where we already expect this.

Scobleizer: "So, Google has a HUGE vested interest in making sure that the weblog communities survive. Let's say that Pyra went out of business. Google would loose much of its competitive advantage (and Microsoft probably would be able to move in and improve its search offerings and maybe even offer its own weblog tool -- anyone remember that Microsoft already offers free Websites over at http://groups.msn.com ?)"
-- I disagree that this is Google's motivation. Plenty of great companies went bankrupt during the dot-com era, and Google can't go around saving them as part of a business plan. They either have something specific in mind, or it's an experiment (one that could fail, a definite possibility since I wouldn't consider Froogle or Google Answers to be successes so far and they don't take up the company's resources that Blogger will).

No Time to Think: "The concept of the 'next big thing' has been building and taking shape. Its the theory of the 'Semantic Web' meets the power of 'Google' meets the value of 'Reputation'. Call it the 'Global Clique' (although one will exist for each subject) - everyone knows everyone (either directly or indirectly), someone knows everything and lots of people know where to find it or who to ask, there is no specific or consistent relationship between the participants (they're loosely coupled), and the thought leaders and the influencers - both in general and on specific subjects - are clear. It just needed a push. Today it got a huge one."

That last one sounds better to me, and I hope that's where Google is headed. However, the great thing about Google - and the benefit that made it integral to our everyday lives - is that it searches "everything" in a distributed fashion and uses the pagerank algorithm to rank the results, all in less than one second. Adding blog link trails to Google News is a great idea (Jim McGee has a good summary), but they should have been able to do this without purchasing Pyra. Their advantage has always been the ability to index and rank content in the outside world, not on their servers. Even if they plan to add this type of functionality into the Google Search Appliance that they sell for big bucks, I would have thought it would be more impressive if they did it in a distributed fashion, without purchasing a blogging company.

My hope is that they'll build a better search engine for individual blog entries. For example, right now I'm trying to find a site that I blogged about last year. It had an RSS sidebar integrated into the main page. When you click on one of the sites listed in the blogroll, that site's headlines opened seamlessly in the sidebar. Earlier tonight, I was trying to remember if it was Gateway or Dell that is making integrated 802.11b standard on its laptops. I had a heck of a time finding it in Google or Daypop (if you're interested, I finally found it in Dell's press releases). What we really need is a more granular search engine for finding content that is unique (the thoughts of bloggers) but not unique (general concepts that are blogged by one, thirty, or a hundred people).

I'll be interested to see how this plays out, but I still think Google is missing the boat by not working closely with librarians. If they truly want to become THE place to go for information, they should continue their work on the semantic web, but there will always be information that they can't provide. The way to fill in those holes is to create a librarian-based pagerank and integrate 24/7 library virtual reference projects into their offerings. A good librarian will thrash Google Answers any day of the week.

Just imagine searching Google for something, and not finding what you need, being unsure of a site's authenticity, or tiring of paging through thousands of results. What if you could type your postal code into a box on the search results page and be connected to your local library's virtual reference service? Suddenly you have an expert at your fingertips, as well as access to subscription-only databases. Tell me that wouldn't rock!

[The Shifted Librarian]
8:49:53 PM    

Google Pyramaniacs Pry Open Enterprise Sales..

Why did Google buy Pyra Labs?

"Klogging". Watch for their Google Search Appliance to come bundled with a version of the Blogger Pro server.

Search, or the lack of it, holds back intranet blogging. When everyone uses Google to search the universe, you expect blogs inside the firewall to show up too. But they don't.

Unless your Google Appliance crawls them.

This is the Lotus Notes killer. A harsh stab at the next Microsoft Office's collaboration tools. When everyone is writing in to their blog, and content is immediately available, why do you need this other stuff?

What's left to complete the picture? Two things:

  1. RSS push to the Google search crawler.
  2. A converged microcontent client.

Who's going to buy?

  • The military and security complexes.
  • Big business, especially those who with a human capital self image.
  • Civil government: cities, states, public service agencies, larger not-for-profits.

Why buy Pyra? Klogging creates searchable, linked content, and that sells appliances.

Further reading:

[a klog apart]

Interesting analysis.  My 2 cents.  Blogs demonstrate the increasing importance of micro-content (i.e. individual posts, paragraphs, lines of IM, SMS, etc.) over traditional documents, but much of business is still driven by deliverables, aka final documents.  Microsoft understands this, which is why client side document creation tools are at the heart of their strategy (ie Office & Sharepoint).  Microsoft needs to learn how to break down the document and blogging tool vendors, in addition to continued innovation in micro-content need to support traditional document management better.


8:42:14 PM    

RSS newsreaders are TiVo for bloggers..

Newsreaders like NewzCrawler and Radio UserLand do TiVo things. Time shifting. Easier, more complete channel and program selection.Season pass for your favorite shows. Record in the background while playing in the foreground. Save a post to your blog instead of to your VCR.

TiVo needs blogspace community tools: add social filtering (recommendations), feedback, and threads of commentary.

[a klog apart]
8:34:23 PM    

© Copyright 2003 Micah Alpern.

 

 


About me

Portfolio

Resume

 

 

enter your email for daily updates

powered by Bloglet

Subscribe to "Micah's Weblog" in Radio UserLand.

Send me email
Click here to send an email to the editor of this weblog.


Current Book
Social Network Analysis : Methods and Applications

 



February 2003
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28  
Jan   Mar