Why scientists should learn from Aaron Swartz.

“He wanted openness, debate, rationality and critical thinking and above all refused to cut corners.” — Lawrence Lessig

Aaron Swartz helped draft the RDF Site Summary (RSS) standard at age 13 and was in many respects a prodigy. As Lawrence Lessig wrote about Aaron: “He wanted openness, debate, rationality and critical thinking and above all refused to cut corners.” Sadly, he perished by his own hand after particularly severe legal action against his person for copyright infringements. The documentary The Internet’s Own Boy: The Story of Aaron Swartz  provides a homage to his  life and work.

He left a legacy of writings which excel in clarity and brilliance I’ve rarely encountered. This is further contrasted by the age at which a lot of these blog posts or essays were written. Few people come close to the the way Aaron articulated his ideas in writing.

In a series of blog posts I’ll summarize some of his ideas with respect to technology, politics and media within the context of contemporary scientific (ecological) research. The fact that his ideas and his vision remain key to what I consider solid scientific practice reflect his genius and insight.

release late, release rarely (release early, release often)

In a blog post written on July 5, 2006 (release late, release rarely) Aaron outlines how to develop software. Yet, this essay could as well apply to scientific research, going from idea to publication.

Similarly to software (pet) projects, the subject of this blog post, science projects often have strong emotions attached to it. While these emotions are truthful the content or quality of the research might not pass muster.

“When you look at something you’re working on … you can’t help but see past the actual thing to the ideas that inspired it… But when others look at it, all they see is a piece of junk.”

In science, this basically means that you should do your homework and don’t oversell your research. In peer-review reviewers will see past these claims and, rightfully so, reject manuscripts because of it. So when you publish, release late, aim for quality not quantity.  This will raise the chance of getting your work published, while at the same time increasing the likelihood of stumbling on errors. Raising the true quality, or making it look good, often highlights inconsistencies you can’t move past in good conscious.

“Well, it looks great but I don’t really like it” is a lot better then “it’s a piece of junk”.

Releasing work late means that no one knows what you are doing and you might miss out on key feedback. So, informally, research benefits from releasing early.

“Still, you can do better. Releasing means showing it to the world. There’s nothing wrong with showing it to friends or experts or even random people in a coffee shop. The friends will give you the emotional support you would have gotten from actual users, without the stress. The experts will point out most of the errors the world would have found, without the insults. And random people will not only give you most of the complaints the public would, they’ll also tell you why the public gave up even before bothering to complain.”

Releasing early, means that you get valuable feedback that might otherwise would not make it into a high quality paper (released late). This feedback does not only come from experts, but as correctly observed, from everyone within a larger (research) community.

In short, scientific communication and progress requires a split approach where manuscripts should be released as late as possible, with ideas mature and solidly supported by open code and data, which was released as early as possible.

Note: Although the argument can be made that conferences serve the purpose of “early releases” I have yet to see a conference where people present truly early work. Most of the time either published or nearly published work is presented.