Why scientists should learn from Aaron Swartz. part 2: on standards and frameworks

Instead of the -let’s just build something that works- attitude that made the Web (and Internet) such a roaring success, they brought the formalizing mindset of mathematicians and the institutional structures of academics and defense contractors. … With them has come academic research and government grants and corporate R&D and the whole apparatus of people and institutions that scream -pipedream-. And instead of spending time building things, they’ve convinced people interested in these ideas that the first thing we need to do is write standards.

In an excerpt of A Programmable Web, Aaron Swartz argues against the bureaucracy which slowed progress toward a semantic web. The Web and the data which resides on it, scientific or not, has been characterized by being used, re-used, cut, re-mixed, copied and mashed-up. This fast, transparent sharing of data is what made the Web and revolutionized how we think about data.

A common misconception in academia is that all standards and frameworks need to be defined up front, therefore creating rigid structures. This leads to publications which posit that a community is in dire need of a new standard or “framework”. Yet, all too often these works overlook easier more flexible solutions which build on existing infrastructure and more agile community driven movements.  At times they even lead to more fragmentation (obligatory XKCD comic below).

As Aaron carefully observed: “To engineers, this is absurd from the start — standards are things you write after you’ve got something working, not before!”

Although I acknowledge that standards are important, within the context of data use and re-use this carries less weight and data accessibility is the limiting factor. In this day and age, if the service isn’t created and carried by the user community, as a software package or larger initiative, chances are that there is little need for such a service (standard, or framework). Unless demonstrated to work first, diverting money to a service no-one wants or needs seems wasteful.

Creating well documented application program interfaces (APIs) to (ecological) data would go a long way in facilitating interoperability without the added cost of supporting a new aggregating platforms or standards (and the various committee members that come with it). Or, talk is cheap and fast and easy access through APIs and ad-hoc integration often trumps institutional frameworks and standardization.