Microsoft's 'Custom XML' patent suit could put ODF at risk
The infamous U.S. District Court for the Eastern District of Texas has slapped Microsoft with a permanent injunction that "prohibits Microsoft from selling or importing to the United States any Microsoft Word products that have the capability of opening .XML, .DOCX or DOCM files (XML files) containing custom XML," according to CNET. This likely won't stop Microsoft Office from shipping, as CNET's Ina Fried writes, but the bigger question may be whether the lawsuit will reach beyond Redmond to also threaten the Open Document Format (ODF).
The lawsuit doesn't affect all of Microsoft Office, but only Word, and only the "Custom XML" functionality, as ZDNet's Mary Jo Foley highlights. Even so, you can almost hear the cries of jubilation from the open-source community, happy to see Microsoft get a taste of its own patent saber-rattling.
However, Gartner analyst Brian Prentice raises a troubling question: does the patent also affect the ODF standard?
The more I read through the patent claim the less confident I was with my initial reaction. In fact, I think this one might actually have some legs. Keep in mind is that this claim was filed back in 1994. The claim considers the existing state of the art at that time....
One thing seems clear to me - this is not a typical rubbish software patent that earns its filer a 20 year monopoly on the dead obvious. Fifteen years ago this would seem to me to have been an innovative idea....
But, if the validity of the patent is upheld then the immediate question is whether this will also impact ODF. If so, then this turns out to be a significantly more important issue and one which will crystallize the fury of the anti-patentistas. No longer will this be the source of some Schadenfreude at Microsoft's expense. This will be seen as yet another attack on open standards and open software.
It's an interesting question, one to which I don't know the answer, not having reviewed the patent in any detailed form. But it's at least a poignant reminder that the collateral damage in any patent infringement lawsuit could well extend beyond the initial target, in this case hitting the open-source world even as Microsoft is smacked around.
The current version of ODF doesn't include Custom XML-type code, as Redmond Magazine writes, but the next version will. Could this patent suit make momentary friends of Microsoft and the open-source community?
Anyone that can offer color commentary on the patent and ODF in particular?
Update: See Sean Michael Kerner's post, suggesting that two particulars (i4i is not a patent troll and i4i and Microsoft had a business relationship) suggest that the open-source world has little to fear from this suit.
Follow me on Twitter @mjasay.
Matt Asay brings a decade of in-the-trenches open-source business and legal experience to The Open Road, with an emphasis on emerging open-source business strategies and opportunities. Matt is vice president of business development at Alfresco, a company that develops open-source software for content management. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure. You can follow Matt on Twitter @mjasay. 




Poking through a .odt file again (it's just a compressed archive easily recognised by 7Zip) there are four .xml files in the root and though one is 'content.xml' and one is 'styles.xml' the styling application is similar in fashion to that employed by CSS (stylesheets being referred to as prior art in the patent's 'background' section) or old-school HTML. If there's anything in the current file format that can specify formatting without reference to a type of tag or without being called specifically inline within the content.xml file then I can't see it and I don't have the time right now to go through the whole ODF spec :op
Whether the 'custom XML' type functionality planned infringes, I haven't looked at that either.
I agree that this is a disconcerting development but there are possibly some programming methodologies that have been around long enough (pointed out by someone elsewhere) to question whether even in 1994 it was entirely original. It is, after all, a use of relatively simple data structures compared to what can be used and was even around in the 1960s.
My how soon we forget.
Tex is nearly 30 years old and no word processor can touch it.
Its such a broad, vague patent it doesn't deserve being on the books in the first place. i4i is just a patent squatter in this case, and microsoft hasn't done anything wrong here. All this is, is just a company that wanted to "patent" an "Invention" back in 1998 has finally made its move. 9 years later in 2007. What have they done? How have they licensed the patent out? Have they ever done that before? Why is microsoft the ONLY entity to run into this patent infringement?
Its simple. They just want money. I think I'll patent air. Like, I'm thinking breathe it in. I mean, think of all the money I could make with such a broad patent. If you breathe, you owe me money. I think that's a good use of the US patent system.
The question is, how fair is it that xml is a system that allows for custom xml schema, but then someone is allowed to patent the ability to render separate files from said customized xml?
You obviously know nothing about this at all, or you wouldn't even have uttered half of what you just did. This company has been producing a product from day one and it is licensed to a lot of people and in use in some of the largest document management systems in use today by the government and corporate sectors.
When you get the initiative to do more than ramble and spend some time doing patent searches for prior art, feel free to thanks this "squatter" for providing the system that allows you to do the searching.
Performing to a basic level of acceptability as millions of the less ambitious yet not criminal Americans do every day does not make them always right, does not mean they can do no wrong and does not decide whether this patent is valid. Someone further down is claiming that software from 1984 has the functionality that this patent describes. If true, it doesn't matter how far and wide their product is sold, what will matter is whether they invented this methodology or whether someone else did in 1984.
The side story to this YetAnotherPatentDebacle is the failure by the pioneers of the markup industry to recognize the need to organize the IP issue early. SGML was perceived as an IBM invention and resisted in the market as IBM was then The Big Bad. As a result of that and the web's own hubris regarding HTML and XML being innovative (invention started here - right!), the history of prior art gets murky in 1994 when the markup community put its blinders on about the damage that would result in not resolving the issues of IP.
The historian's question is would HTML have had the primacy it enjoys without the support of communities that foreswore the patents to ensure it would?
What would a web without markup look like?
That asked, it would be in the markup market's best interest to collect as much information and examples of the SGML technologies as it can for prior art archives. We're at the place where the pioneers will begin to disappear and the human memory that has been keeping record will go with it.
It's like youtube before they were purchase by Google. Viacom didn't do much before Google got their hands on it. Look what happen when they did.
Anyone that can offer color commentary on the patent and ODF in particular?"
Cool!
"Jack Rabbit, Jack Rabbit, Jack Rabbit...."! Now, just how many more "hats" (Patent Claims) are there to be raised around the world - whether against the Microsoft Corporation et al; or, against the Open Source Communities!
This goes back to 1984.
I feel sorry for MS. They try to move to a more open format, and I'll they've gotten from it is grief, in many cases from the same open source zelots that were calling for it to begin with. I understand that it's not /the/ open document format, but the open document format couldn't do everything they wanted (spreadsheet 1.0 options were rediculas). Either way, MS went open, people got pissed that they went open + custom to do what they needed, and now it turns out to be infrigment.
Someone on the office team is just staring at old code on his display right now pondering how much money they could have saved if they just stuck with thier own format.
The open source industry as a whole is teaching Microsoft to stay closed. It's not financially advantageous to open up for MS.
What i have read by jumping around to about twenty different sites and reading up on this.
1. this company is NOT a patent IP troll company. They are not squatting on anything. They are a successful company that has had the product in question since the late 1990's and this issue did not come into play until MS released Word 2003. They have been trying to negotiate with MS since 2003 on this matter and MS has ignored all attempts to get them to license it. Asking someone to license your work is not being a troll, its being responsible to your employees.
2. This patent is being looked at by a lot of people, who after analyzing it are saying it may just be pretty solid, even when compared to the formats everyone keeps mentioning.
3. MS has had six YEARS to try to fight them on this and file to have the patent invalidated and even after losing a $200 million ruling they still don't appear to be doing so. Instead they are pretty much asking the Open Source Community that they love to threaten with patents all the time to come up with something. Which makes me think that their whole legion of lawyers and researchers couldn't find anything in the past six years.
4. This company never asked for $200 million, they wanted $25 million to cover that six YEARS of infringement and they just want to be paid the licensing fees that they deserve for their work. I am not sure how $25 Million works out in normal income when spread out over six years. But I would think they were making some decent coin back when this all started. Seeing as they were winning all kinds of government contracts for the FDA, Smithsonian,, and the largest government software contract ever with the US Patent Office. I think what they asked for was just the normal licensing fees that would have collected since Word 2003 was released and maybe some small amount for expenses for the court cases and lawyers.
I know I sound like I am championing this company, but it really is not about them to me as much as it is of getting tired of almost every story, about every company and every product deteriorating into a bunch of posts about how they are patent troll company, or this was done by X before nothing to see here, this or that company never innovated anything ever, blah blah, blah.
When did all these tech sites become such a pool of negativity that never goes past the quick response. Half the time I don't think the majority of people read the stories but just jump ahead to the comments and regurgitate what previous posters have said. God forbid they actually go and read up more on the issue on google or bing.
The big issue is I've had is how it precisely applies to what Word is doing. What I learned in this post (and did not see yesterday) is that the infringment apparently applies to Word's "Custom XML" (whatever that is). OK, so having not seen "Custom XML", perhaps that does align with what the patent says.
However, having read the Custom XML explanation (high-level though) and other posts here about other systems that separated text from formatting, I still have to question whether they invented something new. I think it is reasonable to still question the validity of the patent.
And why didn't Microsoft work to have it invalidated? I do not know, but I am guessing that they felt no reason to take the matter to the USPTO. If they felt like they did not violate the patent, they had no reason to declare it invalid -- perhaps it is perfectly valid in another context, after all. But, now that a court has said they violated the patent, Microsoft has strong grounds for making an argument that the patent is invalid if they can show even one example where what they did was done before. Showing that -- and it might just be a matter of discussing T/Maker's WriteNow -- and then citing how the patent relates to that pre-existing technology would make the patent invalid. As I said, they didn't have grounds to argue this until it was shown they were in violation, really.
As for whether they're a troll: well, one could still argue that they are. Just because you have a technology patent does not mean you need to go sue people over it. Was Microsoft's use of that patent causing harm to i4i? I doubt it very seriously. Patent trolls usually sue people as a means of earning revenue through patents versus running a real business. So, I can see why people might make such claims here.
The patent has nothing to do with the structure of the saved document (ODF, DOCX) but with the content of the document.
It is like someone patenting a word template to create a Fax cover and then suing when MS includes a template to do the same thing.
I am a developer and I still think that software patents are ridiculous. I am also a Free software (FLOSS) user and advocate and in this case I must be consistent. MS should have won this one.
Otherwise: agreed.
This was WAY before even SGML. Way before XML was called XML
Current accounting porgrams still do exactly this for check printing.
I belive ibm PROFS worked teh same way for formatting documents.
Bad patent decision.
The i4i Patent 5,787,449 describes a document format and a method of encoding where the document content "is totally unstructured and has no embedded metacodes in the data stream." It further states that the document structure definition is described in a separate "metacode map" where "for each metacode applied to the content an entry in the metacode map is created which describes the metacode and gives its position."
On the other hand, "XML (Extensible Markup Language) is a way of marking up structured documents, that is, documents in which the markup primarily indicates the content's purpose rather than its formatting." Although, the content formatting description is stored separately from the structured content, the formatting description does not have to exist at all --- the document interpretation can be based solely on the XML tags which state the content's purpose.
Anyhow, similarities between XML and i4i patented method exist, but the differences are clear. XML document content is structured and mixed-up with markup tags, while i4i format explicitly states the document content "is totally unstructured" and stored in the "raw content area."
Therefore, the i4i patent does not affect XML format at all, if affects non-XML specific proprietary extensions Microsoft uses in the XML-based MS Office document format(s). These are the proprietary extensions Microsoft "invented" to screw-up other software developers who may try to develop open/save procedures for Microsoft proprietary "custom XML" documents. Well, I guess the chickens have come home to roost.
- whatever analysis we do about the patent claims isn't Microsoft in a quite peculiar situation?
- to win they've got to use the same arguments that parties who question their patent claims would use, doesn't they?
To me it looks like a bad choice by MS to let this go to court. To win in court would in my understanding be a pyrrhic victory.
Why? The patent requires that the document is split into two parts - one part containing just the text, and another containing formatting codes that reference the locations within the first part to which they apply. This is roughly how Microsoft's Custom XML feature works - with it, you stick an arbitrary XML file into the document, then the main document XML file pulls in text from it which is referenced by its location (specified as a path in the XML file).
From what I can tell, the ODF answer to this that I've seen looks rather different. You don't stick in an aribitrary XML file - instead, you insert the text into the actual document proper together with metadata to allow it to be extracted later. This doesn't infringe the patent in question, not even close.
This is in stark contrast to how XML works in that the structure of the document is marked up directly within the document. Yes, the patent in question mentions SGML and gives an example of SGML content. But in that example, the method strips out the tags from the SGML content and stores them separately. This is not the same as a CSS stylesheet referring to a specific element within an XML document. In that case, the XML document still retains its structure as tags within the document itself. Only the instructions on how to format the document is separate from the document. The document still retains all of its structural metacodes.
To be clear, the method described in the patent does not pertain to the use of XML files themselves. The patent merely uses an SGML file as an example of the type of file that could be converted to the "new" format described within the patent.
What is unclear to me is whether the patent absolutely requires that all metacodes be stripped out of the original document when being converted to this "new" format. Although I could imagine a way to simply take an existing document (with it's current metacodes included) and adding a supplemental formatting/structure map that provides additional information about how to format that original document, it seems to me that the patent only makes claims about using their method with documents wherein the content is entirely devoid of any metacodes whatsoever. Perhaps this is due to a fundamental misunderstanding I may have about patent law. If all major claims have to be violated or if any one claim can be violated in order to be considered infringing on the patent.
While I do not know exactly how Microsoft's "Custom XML" works, it seems that it would have to work by storing the actual address (or index) of the location within the original document where the "Custom XML" should be applied, if it were to be considered infringing on this patent at all. If "Custom XML" works by referencing existing structural metacodes embedded within the original document then that would be no different from any other use of XML or CSS or XSLT. Perhaps only a Texas judge could possibly be confused about the distinctions.
Continued in next post:
I have any text document. I want to create a template using this document that can then map onto other similar text documents. Therefore, I take the text document and start tagging it with XML. Then, finally I remove the text. This creates the "patented" condition of XML and text stored separately. My final move is to use the XML template to mark other text documents automatically.
This is exactly what i4i did for the US Patent Office, to put all their patents into an XML format. And it's exactly what any competent programmer would do given a similar task.
- by GrantSR August 16, 2009 2:00 PM PDT
- Continued from previous post.
- Like this Reply to this comment
-
(34 Comments)Another thing I have to say about this patent is that it talks a lot about a means to do this and a means to do that but doesn't actually specify the means to do anything other than strip out the metacodes and store them separately. Hell, a high-school programming student could have told you that. You work through the document and store one part in one file and the other part in another file. The algorithms listed in the patent simply use character count to indicate where the codes are supposed to have an effect. Again, basic programming 101. How this patent is non-obvious in any way completely eludes me. Some may say that the basic "idea" of storing metacodes separately from the content was likely unique back in 1994. I beg to differ. While I can't cite any prior art, I know what I was doing back then and storing formatting codes separately from content is something I certainly would not have considered novel or unique in any way. In fact, I would have considered it pretty stupid. Which brings me to my last point:
All in all, I think the "idea" contained within this patent is absolutely wrong-headed and back-asswards. The filers claim that storing the content entirely separate from any structural or formatting information makes it easier to treat the content separately from the formatting and structure. However, since they use character count to indicate where, within the content, the formatting/structure information is to be applied (what they call mapping), every time an extra character is added to the content, all the indexes associated with the formatting data in that separate formatting file have to be updated. The filers claim their "idea" makes it possible to have multiple different formatting/structure files which could each be applied to the same content file. However, this means all those different formatting/structure map files would have to be updated if the content file were modified. What if those formatting/structure map files were stored somewhere separate from the content file? What if the editor of the content didn't have access to the formatting/structure map files? What then? It would be a nightmare far worse than the problem the filers claim to solve. The fact that some people have been willing to pay lots of money to use this "idea" means nothing. It is still a bad "idea."
I much prefer the model chosen by the designers of XML wherein the structure is considered an integral part of the document. In fact, in many documents, without the structure the content is meaningless. If you change the structure, even if you haven't changed the actual text of the content, you definitely have a different document and it should no longer be considered to be the same "content" as proposed by the filers of this patent.
If those overseeing ODF plan to include similar technology in the next version then I think it is a good thing this brouhaha has come up. Hopefully this will prevent them from making a serious mistake. If you want to be able to use different style templates then just choose a different CSS stylesheet. Every stylesheet doesn't have to make use of every class marked within the XML document. If you want to change the structure just use XSLT and transform the original as you need it. The original is always there to be transformed again when you need it again. XML technology solves all of the problems supposedly solved by this patent with none of the associated pitfalls.