Open Data. What have we got?

I attended the ‘Global Graph’ session at the #cetis09 conference and made a largely failed attempt to demo some of the work we’ve been doing with Triplify and the Talis Platform. (In my defence, it wasn’t a planned demo and jiscpress.org was down while Alex was doing some design work).

Anyway, what I would have shown was how each document site on jiscpress.org uses Triplify to provide Linked Data in the form of RDF/N3 triples, which we store on the Talis Platform using a plugin Alex wrote.

Using Alex’s config file for WordPress MultiUser, we drop the triplify directory into the WPMU root directory, alongside wp-admin, wp-includes and all the other WordPress files. You should take a look at the config file and make sure it’s doing what you want it to do, but it will work as it is.  With this in place, Linked Data in the form of an RDF flat file for each document site (blog) is available at http://document.jiscpress.org/triplify or http://jiscpress.org/document/?triplify

(I should warn you that none of the URLs in this post are genuine URLs. They’re examples of syntax. The server at jiscpress.org will stop running at the end of December).

Now, to get that same data onto the Talis Platform, Alex has written a plugin for WPMU that periodically crawls the documents for changes and pushes the new data to a Talis Platform account.  Here are the WPMU site-wide admin options:

Admin settings

and here are the per document site user settings:

User settings

I won’t explain what the plugin does in detail. Just click on those images above and you’ll see the options that are available and if you’re reading this stuff, you know what it’s all about.  The Talis/Triplify plugin for WPMU will appear on  http://wordpress.org/extend/plugins in the next couple of weeks. It’s been tested and it does what we expect it to do but we want to test it more on sub-directory installs before it’s publicly available. Full documentation will appear soon on http://code.google.com/p/jiscpress/wiki/Documentation

We have also developed a WPMU plugin for Open Calais and the Yahoo! Term Extraction API. This provides a background service which indexes each document section (blog post) and creates relationships between content across the platform. We’ll post here about that very soon.

In addition to the Linked Data, JISCPress, using digress.it on WordPress, provides a long list of other open data (not Linked Data) end-points which might be put to good use. Here you go..

Document paragraphs

These are switches that provide individual paragraph data in different formats.

http://test.jiscpress.org/?p=15&digressit-embed=1&format=xml

http://test.jiscpress.org/?p=15&digressit-embed=1&format=text

http://test.jiscpress.org/?p=15&digressit-embed=1&format=rss

http://test.jiscpress.org/?p=15&digressit-embed=1&format=html

http://test.jiscpress.org/?p=15&digressit-embed=1&format=json

Document sections

This is just the regular WordPress post content in RSS format. In JISCPress terms, it’s the document section which is a single feed item.

http://test.jiscpress.org/2009/07/28/6-how-jisc-invests/feed/?withoutcomments=1

and this is the normal WordPress feed of comments on a particular post/document section.

http://test.jiscpress.org/2009/07/28/6-how-jisc-invests/feed/

We’ve also added the provision of a feed for each document section (‘post’), where each paragraph is a feed item. Note that this makes digress.it a nice tool for building your own feeds out of a single WordPress post.

http://test.jiscpress.org/feed/paragraphlevel/3-jisc-vision-mission-and-objectives/

Per paragraph comments/discussions

For each paragraph, there’s a feed of the comments/discussion.

http://test.jiscpress.org/feed/paragraphcomments/3-jisc-vision-mission-and-objectives,1

Commenter feeds

For each person that comments, there’s a feed of their comments

http://test.jiscpress.org/feed/usercomments/Joss%20Winn

All the other stuff

Don’t forget that the entire document content is also available as a feed

http://test.jiscpress.org/feed/

http://test.jiscpress.org/feed/rss

http://test.jiscpress.org/feed/rss2

http://test.jiscpress.org/feed/atom

http://test.jiscpress.org/feed/rdf

as are all comments from the site, too:

http://test.jiscpress.org/comments/feed

with WordPress, tags also have feeds

http://test.jiscpress.org/tag/tag1/feed

and so do categories

http://test.jiscpress.org/category/category1/feed

You can also combine tags

http://test.jiscpress.org/tag/tag1+tag2+tag3/feed

and you can combine tags and categories

http://test.jiscpress.org/?category_name=category1&tag=tag2,tag3&feed=rss2

Finally, authors have a feed, too

http://test.jiscpress.org/author/joss/feed/

Summary

WordPress is a versatile CMS for organising/designing and publishing data as feeds and therefore a useful source of Open Data. JISCPress has extended this versatility by choosing to develop further data end points using digress.it and offering a simple way of publishing Linked Data to the Talis Platform RDF triple store where is can be queried and mashed up using the platform’s API.

Scholarly publishing with WordPress

Working on the JISCPress project, I’ve been thinking quite a lot about scholarly publishing on the web, and in particular with WordPress. This morning, I read a post over on the ArchivePress blog about some WordPress plugins which are useful additions for creating a scholarly blog and it got me thinking a bit more about what features WordPress would need to support scholarly publishing.

JISCPress does away with the idea that WordPress is a blogging tool, and instead uses WordPress Multi-User as a document publishing platform, where one site or ‘blog’ is a document. The way WPMU is structured means that despite serving multiple (potentially millions) of document sites, the platform remains relatively ‘lightweight’ as each document site generates just a handful of additional database tables, while sharing the same administrative core as a single WordPress install. So, 100 WordPress blogs on WPMU is nothing like the equivalent of running 100 separate WordPress blogs, both from the point of resource requirements and administration. In fact, quite soon, there will be no such thing as WPMU as the two products are going to be merged and because they share 90%+ of the same code already, it’s not too difficult to achieve.1

Anyway, my point here is to discuss whether WordPress can be extended to accommodate most conventions found in scholarly publishing and where it is lacking, to identify the development work required to meet the needs of most academic who wish to write on and publish to the web.2

Scholarly publishing extends to a wide variety of published outputs. As a Content Management System (CMS) and technology development platform, I believe that WordPress has the potential to support any type of scholarly publishing that the web supports. It is extremely extensible, as can be seen from the 6000+ plugins that are available. However, what I’m interested in is what can be done now, by an academic wishing to publish their work through the use of WordPress acting as a CMS. What can be achieved with a few quid3 to self-host WordPress so that a few plugins can be installed and a well structured, typical, scholarly paper can be published.

My Dissertation

For some time, I’ve been meaning to publish my MA dissertation. Back in 2002, I undertook some unique research which has not, to my knowledge, been repeated and I think there is some value in having it easily accessible on the web. I have an OpenOffice file and a PDF and, in the course of a morning, have published it under my own domain. The reason I did not publish it on the university WPMU platform is because I have been experimenting with different plugins and did not want to install plugins that were untested or we may not support long-term.  In this case, I’ve used a single WordPress installation, but ideally an individual researcher, group of researchers or research institution, would run a WPMU installation which allowed multiple documents to be authored individually or collaboratively4 and published directly to the web as XHTML.

BuddyPress, by the way, can make the experience even more natural, not only because it is based around a community of like-minded people writing together  on the same web publishing platform, but also because, with a few tweaks here and there, we can move away from the language of blogs and towards the language of documents.


BuddyPress admin bar

Profile menu

Enough of BuddyPress on WPMU for now and back to my dissertation. I set up the site in ten minutes, without using FTP or a command line because I use a host that provides a one-click install of WordPress and WordPress allows you to search for and install plugins from its Dashboard, rather than having to use FTP. Once the site was installed, I then  made some basic changes to the settings, turning on XML-RPC and AtomPub, so that, if I decided to, I could publish to the site using my Word Processor.5 I didn’t use this in the end, but trust me, it works very well using recent versions of MS Word, Open Office (free) and other blogging clients such as MS Live Writer (free).

So, what are the common characteristics of an academic paper? What does WordPress have to support to provide functionality that meets most scholars’ publishing requirements? I scratched my head (and asked on Twitter) and came up with the following:

  • footnotes/endnotes
  • citations
  • use of LaTeX (sciences)
  • tables
  • images
  • bibliography
  • sub-headings
  • annexes
  • appendices
  • dedication
  • abstract
  • table of contents
  • index to figures
  • introduction
  • exposition
  • conclusion

Many of these are supported in WordPress by default and don’t require any additional plugins (tables, images, sub-headings, annexes, appendices, dedication, abstract, introduction, exposition, conclusion, are all either basic literary conventions or just part of a simply structured document).

For additional support, I installed digress.it, which we have funded through the JISCPress project. This is a WordPress plugin which allows readers to comment on the paragraphs of a document, rather than at the document section level. We’re adding a lot more functionality to meet the objectives of the JISCPress project, but I chose digress.it, principally for the reason that it is designed to turn a WordPress blog into a document site. I could have used any other WordPress theme, but digress.it automatically creates a Table of Contents and allows you to re-order WordPress posts when they are read so that you don’t have to author your document in reverse or adjust the publication dates so the document sections appear in the correct order.

My dissertaion published using digress.it

My dissertation published using digress.it

I added the abstract for my dissertation to the ‘about’ page, so it shows up on the front of the site. I also uploaded a PDF version so that people can download it directly. You’ll see that I also added some links to a related book and DVD, which will certainly appeal to people who are interested in my dissertation. The links pull an image and some basic metadata from Amazon, using the Amazon Machine Tags plugin. This could be used to link to the book in which your article is published and earn you money in click referrals. An alternative, would be the Open Book Book Data plugin, which retrieves a book cover and metadata from Open Library, where your book may already be catalogued. If it’s not on Open Library, catalogue it!

After setting this up, I installed a few more plugins:

Dublin Core for WordPress: Automatically adds ten Dublin Core metadata elements to the document mark up.

wp-footnotes: This allows you to easily add footnotes to your document by enclosing your footnote in double parentheses.6

OAI-ORE Resource Map: Automatically marks up the document sections with a OAI-ORE 1.0 resource map.

Google Analyticator: Adds Google Analytics support so you can collect statistics on the readership of your document.

WP Calais Archive Tagger: Analyses your entire document and automatically keywords each section, using the Open Calais API.

Search API: WordPress comes with search built in, but there is a new search API which will eventually make its way into the WordPress core. I’ve installed the plugin to provide full-text search across the document. It can also add Google Search to your document site.

wp-super-cache: This is simple to install and will significantly speed up your document site, making it a pleasure to navigate through and read :-)

Plugins I didn’t use

wp-latex: Although I didn’t need it for my dissertation, it’s worth noting that WordPress supports the use of \LaTeX.

Academic Citation: You need to add a line of code to your theme for this to display. It supports the concept of an article being a single blog post, rather than a ‘document site’ and displays a variety of citation formats for readers to use.

Do you know of any other plugins for a scholarly blog?

The Beauty of Feeds

The other useful thing about managing a document using WordPress and in particular, using digress.it, is that you automatically get RSS/Atom feeds for the document. I’ve already discussed these in detail. It means that I was able to read my document in my feed reader, with footnotes and images displayed correctly.

Document in Google Reader

See how nicely the formatting is preserved. \LaTeX is also rendered correctly in feed readers.

Document formatted nicely in Google Reader

Reading my dissertation in Google Reader

You’ll see that the document sections are listed in order; that is, first section on top. As I noted above, blogs list posts in reverse (most recent first), so I sorted the feed items in Yahoo Pipes and sorted it in ascending order. Yahoo Pipes exports as RSS and it’s that feed that I subscribed to in Google Reader. Wouldn’t it be nice, if I could import my document feed into an Institutional Repository? Wait a minute, I can! :-)

Importing an RSS feed into EPrints

Click to see the item in the repository

Click to see the item in the repository

When importing the default feed, the HTML output is accurate but in reverse order, while the RSS output from Yahoo Pipes didn’t import into EPrints very cleanly at all. I’ll work on this. UPDATE: Forget Yahoo Pipes. WordPress feeds can be sorted with a switch added to the URL: http://example.com/feed/?orderby=post_date&order=ASC

So there it is. An academic paper, published to the web using a modern CMS which supports most authoring and publishing requirements. I would favour an institutional WPMU platform for academics to author directly to, publish their pre-print to the web for open access and detailed comment, and import their RSS feed into the repository. As a proof of concept, I’m quite pleased with this. We are currently developing a widget that can be embedded in a web page or WordPress sidebar and allow a member of staff to upload a document or zipped folder of documents to the Institutional Repository. I wonder if we can also support the import of a feed from the widget, too?

So, what would your requirements be? Tell me and I’ll do my best to test WordPress against them.

  1. Has anyone done a diff on the two code bases to measure exactly what percentage of the code is shared between WP and WPMU?
  2. Actually, I think I’ll save the discussion of its shortfalls for my next post. This one is already long enough.
  3. I pay $5/year for my domain name and as many sub-domains as I need. I pay $10/month for my hosting with unlimited storage and bandwidth.
  4. Like any decent CMS, WordPress supports role-based authoring and editing and maintains a revision history of edits, auto-saved once per minute. Revisions can be compared alongside of each other.
  5. On a scholarly WPMU installation, plugins could be pre-installed and activated, a default theme selected and settings tweaked so very little work is required by the academic author prior to writing her document.
  6. I am using the plugin on this blog!

Related posts

JISCPress: A document discussion platform

We’re very pleased to announce that JISC have agreed to fund JISCPress, a six-month, £32,500 project led by the University of Lincoln, in partnership with the Open University and based on WriteToReply. JISCPress will provide a scalable community platform for publishing and discussing project calls and final reports, in order to support the grant bidding and project dissemination processes.

As you may know, WriteToReply is run in our spare time – lots of late nights and busy lunchtimes. Since launching the re-publication of the Digital Britain – Interim Report, we’ve been looking for ways to bring benefits from our work on WriteToReply, into the Higher Education community where we work. JISC fund much of the UK development and innovation in the use of ICT in teaching and research and in March, announced their Rapid Innovations funding call.

We quickly re-published the call on WriteToReply to demonstrate the benefits of publishing funding calls in this way and then went on to submit a bid which proposed a community platform for the JISC funding call process, based on our experience of setting up and running WriteToReply. As with WriteToReply, this will be an open, public project and all documentation and code will be available under open licenses.

JISCPress is a platform aimed at people working in UK Higher Education, but the platform itself could be easily adapted for other uses, just as WriteToReply is primarily focused on government consultation documents. The final platform will be available as an Amazon Machine Image so anyone will be able to host their own multi-document discussion platform with all the benefits you see on WriteToReply plus the additional features we’ll be developing throughout this project. We’re already advocating the use the platform in our own universities for the open (and closed) discussion of institutional strategies, for the critique of texts by students and for peer-review of research papers. What might you use it for?

Over on the JISCPress project blog, you’ll find links to a mailing listwiki and code repository. Feel free to join us if this WriteToReply spin-off appeals to you. If you know anyone that might be interested, please do let them know.

You’re probably already aware that WriteToReply uses WordPress Multi-User and CommentPressEddie Tejeda, the developer of CommentPress will be working with us on the project and this will result in significant further development of CommentPress 2. So, if you’re interested in WPMU and CommentPress (as many people are), please consider following, contributing to and testing JISCPress.

We should also note that while the project is a spin-off of our work on WriteToReply, neither Tony or Joss are personally receiving any funds from JISC.  The contributions from JISC to cover our time on this project are paid directly to our employers and does not result in any financial benefit to us or WriteToReply (which is in the process of being formalised as a non-profit business).  In other words, while WriteToReply is a personal project, JISCPress is part of our normal work as employees of our universities (both Tony and I are expected to routinely bid and win project funds – you get used to it after a while!). Money has been allocated to fund dedicated developer time to the project, which will pay Eddie and Alex, a student at the University of Lincoln, for their work as freelancers.

Anyway, on with the project! Here’s the outline from our original bid document:

This project will deliver a demonstrator prototype publishing platform for the JISC funding call and dissemination process. It will seek to show how WordPress Multi-User (WPMU) can be used as an effective document authoring, publishing, discussion and syndication platform for JISC’s funding calls and final project reports, and demonstrate how the cumulative effect of publishing this way will lead to an improved platform for the discovery and dissemination of grant-related information and project outputs. In so doing, we hope to provide a means by which JISC project investigators can more effectively discover, and hence build on, related JISC projects. In general, the project will seek to promote openness and collaboration from the point of bid announcements onwards.

The proposed platform is inspired and informed by WriteToReply, a service developed by the principle project staff (Joss Winn and Tony Hirst) in Spring 2009 which re-publishes consultation documents for public comment and allows anyone to re-publish a document for comment by their target community. In our view, this model of publishing meets many of the intended benefits and deliverables of the Rapid Innovation call and Information Environment Programme. The project will exploit well understood and popular open source technologies to implement an alternative infrastructure that enables new processes of funding-related content creation, improves communication around funding calls and enables web-centric methods of dissemination and content re-use. The platform will be extensible and could therefore be the object of further future development by the HE developer community through the creation of plugins that provide desired functionality in the future.

Subject to user requirements, our planned project deliverables are:

  • A WordPress Multi-User based platform for authoring and publishing JISC funding calls in a form that allows paragraph-level comment and discussion either locally or remotely.
  • A meta-site that aggregates all document data into a single site for search, navigation by categories and tags and can syndicate searches, tags and categories.
  • Develop CommentPress to meet WCAG 2.0 accessibility guidelines, meeting public sector requirements.
  • Evaluation and integration of “related content” utilities to dynamically link related project calls and reports based on content and/or semantic analysis.
  • Evaluation and possible integration of remote, realtime messaging services such as Twitter and XMPP integration.
  • Evaluation and possible integration of enterprise authentication services such as LDAP and Shibboleth.
  • Evaluation and possible integration of OpenCalais, a semantic tagging service.
  • Documentation on how to exploit the benefits of AWS and clone the project instance for other uses.
  • A documented suggested workflow for document authors
  • Documented examples of how to fully exploit the platform for data extraction and syndication.
  • Documented ‘user stories’ for the JISC funding call process.

If this sounds interesting, please do take a look at the full project proposal and join us on the mailing list.

JISCPress: A document discussion platform for the Higher Education Community

We’re very pleased to announce that JISC have agreed to fund JISCPress, a six-month, £32,500 project led by the University of Lincoln, in partnership with the Open University and based on WriteToReply. JISCPress will provide a scalable community platform for publishing and discussing project calls and final reports, in order to support the grant bidding and project dissemination processes.

As you may know, WriteToReply is run in our spare time – lots of late nights and busy lunchtimes. Since launching the re-publication of the Digital Britain – Interim Report, we’ve been looking for ways to bring benefits from our work on WriteToReply, into the Higher Education community where we work. JISC fund much of the UK development and innovation in the use of ICT in teaching and research and in March, announced their Rapid Innovations funding call.

We quickly re-published the call on WriteToReply to demonstrate the benefits of publishing funding calls in this way and then went on to submit a bid which proposed a community platform for the JISC funding call process, based on our experience of setting up and running WriteToReply. As with WriteToReply, this will be an open, public project and all documentation and code will be available under open licenses.

JISCPress is a platform aimed at people working in UK Higher Education, but the platform itself could be easily adapted for other uses, just as WriteToReply is primarily focused on government consultation documents. The final platform will be available as an Amazon Machine Image so anyone will be able to host their own multi-document discussion platform with all the benefits you see on WriteToReply plus the additional features we’ll be developing throughout this project. We’re already advocating the use the platform in our own universities for the open (and closed) discussion of institutional strategies, for the critique of texts by students and for peer-review of research papers. What might you use it for?

Over on the JISCPress project blog, you’ll find links to a mailing listwiki and code repository. Feel free to join us if this WriteToReply spin-off appeals to you. If you know anyone that might be interested, please do let them know.

You’re probably already aware that WriteToReply uses WordPress Multi-User and CommentPressEddie Tejeda, the developer of CommentPress will be working with us on the project and this will result in significant further development of CommentPress 2. So, if you’re interested in WPMU and CommentPress (as many people are), please consider following, contributing to and testing JISCPress.

We should also note that while the project is a spin-off of our work on WriteToReply, neither Tony or Joss are personally receiving any funds from JISC.  The contributions from JISC to cover our time on this project are paid directly to our employers and does not result in any financial benefit to us or WriteToReply (which is in the process of being formalised as a non-profit business).  In other words, while WriteToReply is a personal project, JISCPress is part of our normal work as employees of our universities (both Tony and I are expected to routinely bid and win project funds – you get used to it after a while!). Money has been allocated to fund dedicated developer time to the project, which will pay Eddie and Alex, a student at the University of Lincoln, for their work as freelancers.

Anyway, on with the project! Here’s the outline from our original bid document:

This project will deliver a demonstrator prototype publishing platform for the JISC funding call and dissemination process. It will seek to show how WordPress Multi-User (WPMU) can be used as an effective document authoring, publishing, discussion and syndication platform for JISC’s funding calls and final project reports, and demonstrate how the cumulative effect of publishing this way will lead to an improved platform for the discovery and dissemination of grant-related information and project outputs. In so doing, we hope to provide a means by which JISC project investigators can more effectively discover, and hence build on, related JISC projects. In general, the project will seek to promote openness and collaboration from the point of bid announcements onwards.

The proposed platform is inspired and informed by WriteToReply, a service developed by the principle project staff (Joss Winn and Tony Hirst) in Spring 2009 which re-publishes consultation documents for public comment and allows anyone to re-publish a document for comment by their target community. In our view, this model of publishing meets many of the intended benefits and deliverables of the Rapid Innovation call and Information Environment Programme. The project will exploit well understood and popular open source technologies to implement an alternative infrastructure that enables new processes of funding-related content creation, improves communication around funding calls and enables web-centric methods of dissemination and content re-use. The platform will be extensible and could therefore be the object of further future development by the HE developer community through the creation of plugins that provide desired functionality in the future.

Subject to user requirements, our planned project deliverables are:

  • A WordPress Multi-User based platform for authoring and publishing JISC funding calls in a form that allows paragraph-level comment and discussion either locally or remotely.
  • A meta-site that aggregates all document data into a single site for search, navigation by categories and tags and can syndicate searches, tags and categories.
  • Develop CommentPress to meet WCAG 2.0 accessibility guidelines, meeting public sector requirements.
  • Evaluation and integration of “related content” utilities to dynamically link related project calls and reports based on content and/or semantic analysis.
  • Evaluation and possible integration of remote, realtime messaging services such as Twitter and XMPP integration.
  • Evaluation and possible integration of enterprise authentication services such as LDAP and Shibboleth.
  • Evaluation and possible integration of OpenCalais, a semantic tagging service.
  • Documentation on how to exploit the benefits of AWS and clone the project instance for other uses.
  • A documented suggested workflow for document authors
  • Documented examples of how to fully exploit the platform for data extraction and syndication.
  • Documented ‘user stories’ for the JISC funding call process.

If this sounds interesting, please do take a look at the full project proposal and join us on the mailing list.