I realize what I will be addressing is several months late and not the freshest anthropology news, but with the recent news of Google and Facebook joining the data portability initiative, I think this is a opportune time to have an open discussion about paleoanthropological data and data sharing. You’re probably asking yourself, “Google and Facebook have very little to do with paleanthropology, so why is am Kambiz bringing this up?”
What Google and Facebook, as well as a lot of other companies in the ‘social networking’ business are planning to do, is allow a limited layer of people’s data be shared across different services. This is to benefit users, to prevent them from having to maintain dozens of profiles, re-add the same contacts on different services. If that confuses you, imagine a hypothetical scenario like this, when you update your Myspace profile the changes are also reflected in your Facebook profile. Likewise, say you add a contact in Flickr and you also authorize them to add you to their YouTube account. Not only does this eliminate the chore it has become to maintain so many profiles, but it also expands our ability to use these services to really network and explore new information. But this isn’t all dandy and good… there’s a whole slew of privacy issues that come about with this intiative, which I believe will soon should be hashed out. In the long run, I imagine a certain dataset that a user chooses to release, will be opened to be used in other services.
A similar initiative has been brewing in the paleoanthropological field. See, in April of 2007, lots of anthropologists gathered up in New York city for a discussion on “databases, data access, and data sharing in paleoanthropology,” and a summary of what was discussed came out in the October 2007 issue of Evolutionary Anthropology. I just read the article and it got me to think about where I can go with my Hominin Database project which I started up last year, and how paleoanthropologists can use the headway made in the corporate/web 2.0 sphere to their advantage.
Before I get into it all, let me first outline the problems faced in the anthropology world. Paleanthropologists specialize in finding fossils and analyzing them. With a bit of luck and a lot more experience and professionalism in their science, they will find a primate like a hominid. After recovering the fossil from the field, they will take it to a lab and study and compare the hominid. Ultimately, they draw conclusions and produce a publication based upon their research.
But what happens after this point isn’t so cut and dry.
Depending on where in the world the fossils where found, there’s a whole slew of bureaucratic and ethical constraints that prevent the fossil from being shared physically. Most institutions and countries where fossils are found have laws and regulations that do not allow the fossil to leave the country of origin. And to complicate this, since a whole lot of time, money, sweat and sacrifice that goes into finding and analyzing hominid fossils. Often paleoanthropologists become possessive of their find and do everything they can to prevent their fossil from being seen by others. I don’t blame them. Some worry about how they will be given credit for their find and how they can control what is concluded about the fossil in the future.
Other paleoanthropologists aren’t so possessive. They see great potential in sharing their data and have turned to the web to share. As we all know, the web is a great way to put out information have it be used by people. There are already some motivated folk who have mobilized and created databases such as the RHOI Specimen Database and Primo the NYCEP Primate Morphology database. But these projects have run into a major snag that Google, Facebook, Plaxo, etc. and other social networking sites are also facing… with all these different approaches to produce databases and share data such as fossils or people’s profiles, these projects are becoming more and more specialized or divergent in evolutionary terms. Already they have grown apart and are unable to be shared across different networks. What the big tech companies and these paleoanthropological database projects are both confronting is the challenge of creating a uniform way to share a predetermined layer of data across other networks.
I can think of one but rather archaic way of doing this and that is to agree upon and use a standard database structure with uniform field names and tables. Ideally this would have been done before people created their own databases, because as databases grow into their own structures and organizations it becomes a monumental, and sometimes tricky effort to migrate data into a new structure. I’ve had the uncomfortable pleasure of banging my head against the table on far too many occasions that I’d like to admit as I upgraded various databases into new structures. Sometimes, I’ve even lost data, which is often priceless! Furthermore, people using different database software and different web developing languages it becomes a bigger mess!
I really don’t think it anyone’s business to say, “Hey, if you wanna be part of the new Paleoanthropology 2.0, then you have to make your database like this or you have to use this database/language.” I really don’t think that will make even the most enthusiastic data sharers wanna join and since people have already made and organized their data, I think it would be too much to ask for them to completely rewrite their databases and import the old data into the new structure.
Instead, I think using technology and standardizing formats such as XML and RSS; tools currently available in reading blogs through news readers such as Google Reader, we can accomplish data sharing and still allow people to maintain their own database design. How? Currently, there’s a lot of different software platforms that drive blogs… from Blogger to WordPress to Movable Type. They all have different database structures, but how they export the data into RSS feeds is all pretty much standard, uniform and most importantly an established technology. How do I know? I read my blogs and news in my news reader and I’ve used all the different blogging platforms. I know how WordPress uses dynamic PHP usually on a SQL database, how Blogger generates static HTML files, and how Movable Type functions dynamically similar to WordPress but with Perl instead of PHP. All of these platforms produce RSS feeds with a title and content layer of the data despite the computer language or database format they use. This technology can be used to import and export various forms and structures of paleoanthropological data.
Now, I don’t know exactly how the conference on databasing in paleoanthropology concluded upon sharing data. I wasn’t there. The Evolutionary Anthropology article indicates people agreed to make a standard database structure and use a portal site, Paleoanthrportal.org to display the shared datasets. Again, I don’t think that’s the best way to go about doing things. People will still make databases however the hell they want. To ask them to conform to a uniform way isn’t as easily accomplished as it seems. Rather, having a way for them to publish their data using RSS with a uniform structure is the way to go. It won’t alter the original database, rather… during the exportation process, scripts will change field names in the database to the uniform standard of the RSS file… usually in XML format. At most this will be an addon script, run periodically on the server. Furthermore, the databases are still maintained by individual projects and no one will lose control. The RSS feeds are published regularly and aggregated with other services; in this case a network of participating paleoanthropological databases that will accurately reflect the newest datasets. This is make the data be used on other sites.
I understand this may have been way above most people’s heads. And I was just brainstorming too. I admit I have no foolproof way to do this. Also, I don’t imagine most of Anthropology.net’s readers are database administrators nor web developers, but as anthropologists… as scientists, I think we should all be wondering how we can begin to share our data. The biologists are leaving us in the dust, with the massive GenBank and SwissProt databases (which by the way, sync data even though they are run by two different governments). Lastly, these problems don’t just exist in the paleoanthropological subset of anthropology. Archaeologists and linguists also face similar challenges.
We should all acknowledge that these fossils and artifacts have been locked away for a long time, in the dirt and in the sediment, and we need to do our best to share it with the rest of the world rather than curate and ignore it. So please let me know what you think about my argument for sharing paleoanthropological data using RSS, or if you have a better idea on how to go about creating sharable data please tell us!