The Evolution of XML Vocabularies http://naeblis.cx/notebook/XMLVocabularyEvolution/ Copyright © 2004 Ryan Tomayko en-us Ryan Tomayko(rtomayko@gmail.com) hourly 1 2000-01-01T12:00+00:00 2004-06-16T07:53:00-04:00 Dag vs. Fedora http://naeblis.cx/notebook/XMLVocabularyEvolution/Dag http://naeblis.cx/notebook/XMLVocabularyEvolution/Dag 2004-06-16T07:53:00-04:00 Dag runs an extensive repository of packages for RPM based GNU/Linux distributions. There is constantly strife between his self-run, low-process methods of maintaining a repository of packages and those of the Fedora.us people.

Another example. Dag runs an extensive repository of packages for RPM based GNU/Linux distributions. There are two cults here: the Fedora Extras (fedora.us and livna repositories) and the “3rd Party Repositories” (Dag, freshrpms, NewRPMs, and atrpms [acuracy?]). The “3rd Party Repositories” are at the lowest level of the evolutionary chain. There is little barrier to releasing new packages. The Extras (fedora.us) repository plays in the middle of the chain. There are formal guidelines for package releases and tools for managing bugs and feature requests, etc. At the highest level of the evolutionary chain is the Fedora Core distribution itself—even more process around getting new packages established.

There are serious issues with this situation. The linked thread has Dag defending the 3rd party repositories existence. The Extras people look down on the 3rd party people because they have less process around packaging standards or quality control. What they fail to recognize is that having a level where these measures are less restrictive is healthy because it provides a place where people can play with packages on the ground floor. There's a ton of weeding out of useless packages that occurs here.

The 3rd party guys should probably be trying harder to push their stuff up through the next tier though. People finding packages they like in 3rd party repos should be shepherding them on the official Extras repos.

The Slashdot View on RSS/Atom http://naeblis.cx/notebook/XMLVocabularyEvolution/SlashdotWanks http://naeblis.cx/notebook/XMLVocabularyEvolution/SlashdotWanks 2004-06-12T21:10:54-04:00 The mean opinion on the Atom/RSS situation.

An article from Slashdot on how Google might support RSS over or in addition to the more recent Atom syndication format.

The comments on this article are amazing. First, it is enlightening to see a majority of “technical” people having little understanding about the various standardization processes or why multiple bodies even exist. This is slashdot so “not understanding” takes the form of people asserting inaccurate statements as fact instead of asking questions of course.

Ignorance aside, there are some beautiful examples of the types of things I want to speak to. For example, this is a really good question:

Why did atom even come into existence? Was not RSS already established, or is there some kind of deficiency in RSS that I'm missing here?

And here is a commonly held response:

If we didn't keep reinventing the wheel then society would be plagued with unemployed wheel inventors with nothing to keep them busy. It would be a nightmare.

What I'd like to get at here is that what we are seeing with RSS/Atom is evolution not reinvention. RSS/Atom is such a great example of what I would like to explore because it shows the ugliness that must occur in the evolution of popular data formats and the systems that use them. These things should start extremely primitive and specific and be thrown out there so that they can be tested for whether they have value at all. Once some critical value level is reached, you need to formalize a little bit. And then do it a little bit more. And then you eventually reach a point where you have a decent idea of what the problem domain is and you go back and attack with a clean slate (Atom).

So it seems an overall point I would like to make is that we need to be looking for patterns that tell us when to move to the next stage in the evolution of a format or system and instead of attacking those that recognize these patterns, embrace them and their ideas and move ahead. The first wheels were probably square, or maybe there were all shapes of wheels being developed in different places for different things. And then people using square wheels got a chance to see people using triangle wheels, and people using triangle wheels saw people using round wheels. Then they all started talking and the square wheel guys had the right material and the triangle wheel guys had the right ratios and the round wheel guys had the right shape. So they decided to agree on how wheels might share some things in common and from this comes the best of breed wheels we have today. But if people weren't out there “reinventing” the wheel, they might still be square. And if the square wheel guys had to wait for a standards body before they created there primitive and shitty wheels, we might not have wheels at all.

Elliotte Rusty Harold on xml-dev http://naeblis.cx/notebook/XMLVocabularyEvolution/ElliotteRustyHarold http://naeblis.cx/notebook/XMLVocabularyEvolution/ElliotteRustyHarold 2004-06-12T23:51:08-04:00 Elliotte Rusty Harold pushes a an information model where XML data is very loosely defined--as in no schemas--between producers and consumers.

An interesting thread on the XML deviant mailing list where Elliotte Rusty Harold is pushing an information model where XML data is very loosely defined—as in no schemas—between producers and consumers. This seems to be in line with ideas put forth by Walter Perry over the past eight years suggesting that Standard Data Vocabularies are the wrong way to go and that everyone needs to just put data out there as XML. Tools like XPath and XSLT will allow different parties to interoperate. This results in very little barrier to publishing data with the expense of requiring consumers of that data to implement at least some customized selection or transformation or logic.

I agree with this view and it forms a large part of the foundation of the databank concept. That is, you need to be able to publish data quickly, whether a standard is available or not, and without having to define a formal schema. This is the first stage in the evolution of a data format. There is a high level of incompatibility between vocabularies describing the same thing but people are kicking the tires and they can do it quickly. But at some point this gets out of control and people need to come together and agree on how to provide that information in a common way, bringing the data format to stage 2. However, until you reach some level of usefulness, going down the schemas/standards road is a big waste of time because the requirements are immature and the use cases are weakly defined (all the SOAP stock quote / weather examples for instance).