September 16, 2004 4:00 AM PDT

Microsoft flip-flop may signal blog clog

As Web logs gain in popularity, critics warn that they are increasingly becoming the Internet's new bandwidth hog.

The issue has been in the spotlight for much of this month, following a decision by Microsoft to abbreviate developer blogs both on its Web site and in syndication, citing a bandwidth crunch. The Redmond, Wash., software giant stopped delivering the full text of postings on the Microsoft Developer Network (MSDN) to blog subscribers, requiring them instead to follow a link to read the postings in their entirety. Facing a clamor of criticism from its own developers, Microsoft on Tuesday backtracked on that decision.

News.context

What's new:
Microsoft recently reversed itself on a decision to abbreviate developer blogs both on its Web site and in syndication, citing bandwidth.

Bottom line:
As blogging gains popularity, network administrators could face tough choices in meeting a demand that promises to put new strains on server resources.

More stories on this topic

Microsoft's flip-flop is a red flag for large enterprises and other groups that host and syndicate bloggers. As the practice gains popularity, network administrators could face tough choices in meeting a demand that promises to put new strains on server resources.

The developments at MSDN have also raised questions about fundamental Internet and blogging protocols and practices, with the "blogosphere" erupting in debates over everything from obscure extensions to HTTP to the wisdom of group blogs and the resurrection of push technology.

"This is part of a bigger trend," said Mike Morford, a senior technologist with Packeteer, a company in Cupertino, Calif., whose software helps network administrators manage bandwidth. Blog syndication is "currently one of the best tools for sharing more information more effectively. The problem is more information takes more bandwidth, and bandwidth is not free."

That won't get an argument from Microsoft. For months, the company published whole blog entries by members of MSDN in a single aggregated feed, both on MSDN's blog page and in syndication using the RSS (Really Simple Syndication) protocol.

Bandwidth-hogging blogs
But as blogging became more popular at MSDN, the site's page sizes ballooned and bandwidth costs swelled.

On Sept. 4, the company acted to conserve resources by abbreviating to only their first 500 characters both syndication feeds and the blogs as they appeared on the MSDN Web site. Subscribers and site visitors could follow a link to read the whole post. That economy shaved the MSDN blog page by 75 percent to about 100K.

It also raised the ire of MSDN bloggers.

"In the blogosphere, there is hardly anything more irritating (than) an abbreviated RSS feed," blogger Steve Main wrote on his blog. "The whole purpose of an RSS aggregator is so that I don't have to open my freaking Web browser to 100 different pages. By having the content right there in my aggregator, I can skim an entire article in the time it takes to open up a new Web browser. By not including full content in the RSS feed, you take away some of the productivity gains that RSS offers."

Microsoft responded to a torrent of similar blogger criticism Wednesday by restoring the full blogs in its aggregated RSS feed and upping the character limit on the Web page to 1,250 from 500.

"We were looking for ways to enhance operational efficiency," said Kevin Ledley, group manager at MSDN. "What we have now is the best of both worlds: a Web page within a reasonable size, and within the reader we offer the full text so you can consume the full blog without having to leave your reader."

Microsoft's blog abbreviation debacle comes as blogging in general and RSS specifically make inroads into more spheres of business and personal life. Blogs are now an established means of corporate and developer communications. RSS has become a technological staple for news organizations including Reuters and CNET News.com, as well as a common tool for individual bloggers to distribute their dispatches about work and home.

MSDN's struggle with its aggregated feed raised questions among bloggers about the wisdom of pulling together numerous blogs into a single feed, with many at Microsoft weighing in against the practice.

Is RSS "broken"?
But it also raised questions about fundamental technologies behind blogging and Internet packet transmission generally, sparking a war of words among some bloggers about RSS's ability to scale with large numbers of blog postings.

"RSS is broken," wrote Microsoft technical evangelist Robert Scoble in a Sept. 8 blog posting. "It's not scalable when tens of thousands of people start subscribing to thousands of separate RSS feeds and start pulling down those feeds every few minutes...Clearly, RSS is losing some of its advantages. More and more sites are not providing full-text feeds."

RSS has taken heat for how it is maintained, and has sustained a challenge from a newer syndication protocol called Atom. Discussions on the future of the two competing protocols are ongoing.

Scoble's post elicited a strong defense by Dave Winer, a longtime champion of RSS who still exerts substantial authority over the protocol.

Winer contested Scoble's suggestion that the issue was "thousands of separate RSS feeds." Rather, Winer said, the problem is Microsoft's decision to offer an aggregate feed of all the MSDN blog postings.

"The other guys are screaming fire in a movie theater," Winer said following his blog posting on the subject. "The solution is for Microsoft to cancel the aggregated feed."

Microsoft also played down Scoble's attack on RSS.

"I don't think there's any limitation to RSS that we know of," MSDN's Ledley said. "We were trying to do a lot with one feed and we're trying to figure out the best way to go about that. I wouldn't argue with Robert and his viewpoint, because he looks at RSS from a different perspective than I do. But that's the great thing about blogs--people have their own opinions."

In an interview, Scoble acknowledged that calling RSS broken was "overstating it a bit," but he defended his criticisms of the protocol's ability to scale.

"I know of some big publishers who are very afraid of the scalability of RSS, and that fear is keeping them from implementing RSS or Atom feeds," Scoble said.

Scoble said the problem is that RSS readers typically are set by default to query RSS servers for new content on the hour. Multiplied by thousands of blogs and many more readers, that creates a huge bandwidth demand.

Scoble's solution is to abandon the automatic hourly queries, known as "polling."

"Lots of people have a vested interest in seeing RSS expanded," Scoble said. "I want to see that too, but I want people to be aware that they are putting strain on the systems if they are polling all day long. I've personally changed my defaults to only poll manually--right before I start reading the feeds. If aggregator producers changed the defaults, that'd really help."

Scoble echoed widespread criticism of the practice--exemplified by Microsoft's MSDN feed--of aggregating many hundreds of blogs in a group feed. The alternative, initially more time-consuming for the blog subscriber, is to select and subscribe to blogs individually.

An alternative
Others are turning their hopes to a neglected extension of the Internet's fundamental transport protocol, HTTP.

That extension, RFC (Request for Comment) 3229, lets Web browsers and RSS readers request from Web servers and RSS syndicators only information that is new since the last request.

While RFC 3229 has seen little adoption on the Web, blogging enthusiasts are hoping that it will help blog syndication meet the significant bandwidth challenge facing it.

PubSub Concepts, for example, scans millions of blogs continually to provide its subscribers with alerts within seconds of when certain keywords show up. Absent widespread implementation of RFC 3229, the company finds that it is scanning the same material over and over again, a substantially wasteful exercise.

"We read 3 million RSS files a day multiple times, and most of what we read is junk," said Bob Wyman, chief technology officer of PubSub Concepts in New York. "Only about 3 (percent) to 4 percent of the data is new. From our point of view, RFC 3229 would be a lot better for us because our bandwidth consumption would go down by as much as--let's be conservative here--70 (percent) to 80 percent. It's such a large saving that we know it's going to be massive."

Rather than polling, Wyman suggests that the blogging world revisit a concept long abandoned as an Internet nonstarter--push technology, made famous by the failed early start-up Pointcast.

Instead of pulling down large feeds of blogs at regular intervals, Wyman advocates using his scan-and-notify method to selectively send or "push" blog postings to subscribers when they include specified keywords or subjects.

RFC 3229 took a small but significant step forward Wednesday when blogging software WordPress gained an extension that supports it.

Wyman acknowledged that headaches caused by blog bandwidth demand were showing up primarily in exceptional circumstances, for example his specialized search service and Microsoft's huge and active blogging community.

But Microsoft's problem today is the Internet's tomorrow, he warned.

"At Microsoft, anything they do they wind up with quite a large audience," Wyman said. "But they've just hit this problem sooner. Others will too."

3 comments

Join the conversation!
Add your comment
What a crock!
Leave it to Microsoft to focus on how to limit free speech in a new medium versus cracking down on the spammers. Gartner reports that 31% of all e-mail traffic is spam, yet Microsoft is blaming bloggers for network congestion?? Yeah, right Bill! And you believe Dan Rather????
This is why CBS and Microsoft are doomed in the long run.
Posted by (2 comments )
Reply Link Flag
This is crazy.....
This whole thing is crazy.....

Web logs...blogs.....

The "real" bandwidth hogs are coming from spam and online advertising. If you want "operational efficiency", get rid of spam and online advertising. How much bandwidth is needlessly wasted on making sure that I see a dozen ads between the pulling up of CNET's website and the posting of this post? If I were to print out a map from mapquest, there would be an ad printed that I don't even see on the screen. Pictures take more than text.....

Ya know...
technicly, this post is a web log. Any time you go to a website, that action is logged on a server somewhere...doesn't that constitute a weblog? What happens on blogs are really no differant than what happens here, or on any other site.
Posted by Prndll (382 comments )
Reply Link Flag
Blogs and Normalization theory
I think a normalization scheme for RSS etc., analgous to the normalization found in the databases -- needs to be proposed. This would allow smart querying of resource found in the feeds. Read about it at <a class="jive-link-external" href="http://www.khaitan.org/mt/archives/000027.html" target="_newWindow">http://www.khaitan.org/mt/archives/000027.html</a>
Posted by xbhatti (2 comments )
Reply Link Flag
 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

What's Hot

Discussions

Shared

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.