Ellen Blogs Research

Monday, 17 August 2015

Resurrected...

Friday, 22 November 2013

Who are the innovators in academic publishing?

It's been a while hasn't it? Sorry to my avid readers who've been checking in every week in hopes of a new post (hello Mum!). And to the rest of you - welcome back to my occasional ramblings.

I've been in America. Lucky me! One of the highlights of the trip was a flying visit to Stanford University, where I witnessed, with some awe, what an $18.7bn endowment will buy you. I was visiting a biologist friend, and once we'd finished poking gentle fun at the extravagent memorial to Mrs Stanford's former tennis partner, I took her to the library. Turns out that it was her first visit, in almost three years as a post-doc. (That's a subject for another time, perhaps.)

Stanford Library is a wonderful set of buildings with outstanding facilities and collections, but that's not what I want to blog about today. No, instead I am going to write about Donald Knuth. Knuth is a Stanford computer scientist and mathematician, who I am more and more impressed with as I learn (thank you, Google) that he is the subject of an xkcd comic who also owns a pipe organ. But neither of these wonderful facts are what first drew me to him. No, it was a few scraps of paper in a small archival cabinet in the Green Library at Stanford.

In 1976, Knuth received the proofs for the second edition of his seminal work, The Art of Computer Programming. The first edition had been published using metal type, while the second used photographic techniques. Knuth was deeply unimpressed with how his mathematical equations appeared when set using the new technology, and turned his considerable talents towards devising something better. It took him more than ten years, but he developed a typesetting system, TeX, which is used to this day by publishers including CUP, Elsevier, OUP and Springer.

This got me thinking. We talk a lot about the unpaid labour that academics put into the publishing process: generating articles, undertaking peer review, in many cases doing much of the work associated with editing journals. But Knuth's work went beyond this: he saw a problem with the wider publishing system and set out to resolve it, leading to an innovation which is now of benefit to publishers and academics worldwide. He encouraged and enabled change.

It's not an isolated example. Open access publishing seems to be particularly rich in evidence of academics leading, or even forcing, innovation. Researchers were crucial in setting up PLoS and arXiv, organisations which have subsequently driven developments among other more traditional publishers (think of the mega-journals that seem to be popping up like mushrooms, or RSC's new open access chemistry repository). The Open Access Toolset Alliance is full of interesting researcher-led innovations in publishing infrastructure, including the kind of typesetting work done by Knuth. And the next generation are getting involved as well: Open Access Button, which tracks reader encounters with paywalls, was founded by two undergraduates, and launched just last week.

There are more innovations beyond the bounds of open access. Figshare was set up by Mark Hahnel while doing his PhD; it helps researchers receive credit for all their outputs including data. ImpactStory was established by a researcher and a PhD student, while the h-index and the Eigenfactor measures of article impact were both designed and developed by researchers; all of these are starting to influence the way that publishers talk about the reach of the work that they publish.

None of this is to diminish the important work that publishers do in stimulating innovation and driving thinking. Figshare is part of a stable of really interesting companies under the Digital Science umbrella at Macmillan, a model which can provide stability and integration with a large publisher to help small projects grow into fully-rounded services. Altmetric, a fellow Digital Science company, was not an academic venture but a start-up purchased and nurtured by the publishing group. Nature did some interesting and innovative work with Connotea, now deceased. Elsevier Labs has done important thinking around issues such as contributorship (although the website seems to have gone a bit quiet of late). BMJ produces a wealth of data on issues from peer review to ethics, designed to make them the world's first evidence-based publisher. And Palgrave Pivot reflects some interesting thinking by a publisher on the best way to support long-form publication in the humanities and social sciences.

Furthermore, it is probably right that researchers drive a lot of these innovations. They are, after all, the end users of publisher services, as both authors and readers, and they are the ones who know best what is working in its current format and what needs to change. But it's important that publishers acknowledge the important role that scholars often play, not just in providing and reviewing content for journals, but in shaping how those journals look, behave and are brought into being. We don't always talk about that enough.

Monday, 19 August 2013

How my mother almost became an international copyright criminal

My mum has appeared on this blog before. But not, until now, as the hero of a post. She works in the fundraising department of a reasonably large charitable organisation in the medium-sized town where I grew up. Earlier this year, she emailed and asked me whether I had access to a certain publisher’s online website, as she couldn't get hold of an article she wanted to read.

My first reaction was pleasure that my repeated explanations of what I do for a living have actually sunk in. Then, snapping into web-native daughter mode, I rolled my eyes and clicked on the link she sent me, ready to tell her that she would probably have to pay to get at the article. As it turned out, the piece was published way back in 1998 by an author based at an organisation which does research, but isn’t a university. When I requested the PDF, I was taken to a page offering a range of options for access, all of which required an institutional login. No pay-per-view option at all.

So, I took to Twitter. What should she do? Several suggestions came back. How about DeepDyve? Good idea: I had a look. The journal’s not on there. What about walk-in access at the nearest university? Well, discounting the fact that it’s a good half-hour journey to get there plus the time she’d have to spend registering and finding the hard copy of the article, it wouldn't do much good; their holdings of the journal start in 1999. Email the author, said someone else. Maybe – we didn’t try. But a Google search reveals that he’s now left the organisation he was affiliated with and moved to the US. I wonder how easily he'll be able to find that PDF from 1998 – probably three or four computers ago? Of course there’s no repository at his old organisation – it’s not a university.

I had some ideas of my own. Mum’s charity is very loosely connected to the NHS. Perhaps that’s a way in? But – unsurprisingly – the NHS library services don't subscribe to niche journals on fundraising. This is probably good news for the taxpayer, but not for my mother. BL Document Supply? The journal’s not there.

This fairly mundane little anecdote is so interesting to me. We know, from previous studies, that the paywall can be a big, big barrier to researchers outside academia who are seeking to get hold of published information. (I should add that ‘Ask your daughter’ was not a survey response option on that particular piece of RIN research.) In this case, though, the real challenge is that there is no straightforward way for my mother to access that journal article legally. She could ask her employer to take out a multi-thousand dollar subscription which includes back issues, but they don’t have a library or librarian to manage the process and have no need for any other content from the journal. The article doesn't appear in a Copac search but she could scour university library websites in case she finds one which holds the 1998 edition of the journal that she wants, travel there, register as a walk-in user and hope that she’s allowed onto the computers to read it. Neither of these options are likely to be approved by her budget-conscious line manager.

I find myself wondering who this situation benefits. Not the publisher – they’re not making any money out of Mum and in fact are denying themselves the opportunity to make a small profit (not a big one – Mum is working for a charity after all). Not the author of the research, who would presumably be happy to know that it is available and being used. Certainly not my mother, and not the institution that she works for.

As I've said before, arguments for public access to research outputs are often built on a kind of moral foundation: the public have paid for it, so they should be entitled to see it. But there's another argument - maybe even a bigger one - about re-opening an enormous archive; articles which are currently shut off because they're not sufficiently in-demand for publishers to invest in one-off access solutions. That archive might be especially useful to people working outside the fast-moving, detailed world of academia, although in some subjects the half-lives of articles would easily cover the fifteen years I'm discussing here. But in an age of 'impact', it seems sad to limit the reach and availability of articles simply because a publisher hasn't implemented a pay-per-view or rental option. (We can haggle about the price another time...)

I hate to leave a story unfinished, but I'm afraid I can't tell you whether Mum did eventually get her article. As I've said, she couldn't do it legally. Perhaps I emailed a friendly librarian to ask for a copy? Hmmm. You might very well think that; I couldn't possibly comment.

Monday, 5 August 2013

Is openness always best?

Today, I fulfilled a lifetime ambition by appearing in the Guardian. Well, OK, not lifetime. I've only been reading it for about seventeen years. And when I say 'appearing' what I obviously mean is having some research that I worked on alluded to, without any citation, quotation or link to the findings. But still... you take your victories where you find them, right?

Actually, it was a rather dispiriting experience. The journalist had picked up on one finding from our two-year project and used it as a hook for her piece, on how universities are engaging with big data. The finding was one that I blogged about quite early in the project. At the start of that post, there is a big line in bold type which essentially says 'this finding is dodgy! Don't use it!'. We subsequently did some further analysis and came up with a more nuanced interpretation of the data which told a more ambiguous story.

Guess which one made it into the piece?

This being the Guardian, any Tom, Dick or Harriet can weigh in with his or her two penn'orth in the comments section. This makes for pretty fun and occasionally informative reading on some of the articles. But most comments on our work fell into one of two categories. First: 'well durr! how much time and money went into proving this extremely obvious finding?' and second: 'surely these idiot researchers can see that not using the library is a symptom of failure, not a cause?'.

This whole situation relates to some things I've been considering for a while about public access to research, one of the Government's big arguments in favour of open access. I know that people hold quite strong views about the public's ability to engage with academic outputs. I don't have any evidence on that to sway me either way. But this one experience highlights a few points that I'm not sure we really talk about enough when it comes to openness.

First: research is messy. Being open about this messiness is good, but it carries some heavy risks. Before we blogged the early, flawed but headline-grabbing finding, we had a long conversation about whether it was right to share it. We knew that because it was a hard number telling a positive story about libraries, people would pick it up and use it. I was afraid that the message about its flaws would get lost in re-tellings. But we decided that the project was about being open, and openness means showing your working. Unfortunately, I've been proven right. The later, better, results are ignored, and so is the clear health warning on the early, messy ones, because the simple story is too compelling.

Second: just because we make something open doesn't mean people will actually read it. (Is this the publishing version of horses - water - drinking?). We fell at the first hurdle when the Guardian journalist neglected to link to our blog, showing all the findings. But it's not that hard to find via Google (there it is, result number five). Instead, people simply engaged with the journalist's flawed and partial representation of our results. If they had read - even glanced at - the project blog, they would have seen that the finding about dropping out was one tiny part of a much bigger research project on supporting student library usage, which answers the 'what a waste of time' objection in the comments. And they would also have seen that almost every blog post about findings stresses that correlation is not causation; our findings are indicators to support interventions or areas for further research, not explanations for student outcomes. So they don't need to tell us that the relationship isn't causal - we know. But because people only want or have time to engage with the journalist's interpretation, they have a very incomplete understanding of the research.

Third: what is a researcher supposed to do when this kind of thing happens? I'm trying to clear up some errors in this post but, even on a good day, I can't claim that Ellen Blogs Research has the Guardian's reach. Should I go into the comments section and respond to the same misunderstanding each of the seventeen times it occurs? Should I contact the journalist with a hissy-fit email and demand right of reply in the well-read Corrections and Clarifications column?

Finally: some people are really stupid. What's that? Our findings about undergraduates must be nonsense because you finished your postgraduate degree and got a first without using the library once? Well, thank GOODNESS you were around to clear that one up for us! Our two years of statistical analysis completely fall apart in the face of your single anecdote.

Deep breath. I am aware that this isn't life-or-death stuff. Nobody is going to suffer because a few hundred Guardian readers go away with a misunderstanding about a fairly specialist research project on student library usage. But these are questions we need to consider as we begin to open up all scientific research for public access, because some of it will be life-or-death stuff. Let's consider Andrew Wakefield and the MMR nightmare. In this case, a person may have died because of irresponsible scientific reporting and the public's inability to engage with the messiness of science. People want a clear and simple story, and journalists are happy to provide it. And once that story was in the public domain, it proved extremely difficult to counteract, even among people who, by their own confession, ought to have known better.

Now, we might argue that open access could be a solution to these problems. We no longer have to rely on journalists to interpret the findings, we can go back to them ourselves and see what they actually say. But my experience today suggests that this overestimates the enthusiasm and ability of the general public (or, at least, that bit of it which reads and comments on the Guardian website). And, even if people did go back to the original research, would they understand the findings? I'm pretty sure the chap with the anecdata about his degree success wouldn't.

I believe in open access. I think it is a good thing that the general public should be able to see the results of scientific research. But I think we also need to acknowledge that making this complicated, messy, highly technical content open to people who don't have the expertise - or perhaps even the inclination - to explore it properly, is a risk. And that if we are serious about openness we need to do more to help people find, read, understand and critique the original research outputs. I don't know how we do this. But I'd certainly like to start trying to find out.

Friday, 12 July 2013

The future of the book part...one? Or, what is right about getting it wrong

I missed what looked like a brilliant evening at the Wellcome Collection last week. Wrong! was a celebration of scientific failure; a recognition that the things that don't work are sometimes as useful as the things that do. This really echoes some work I've been doing recently with charities which fund medical research. Although interested in open access, many of them are more concerned about ensuring negative findings get published. They don't want to fund researchers to replicate mistakes that have aready been made; they want them to do new work and - perhaps - make some mistakes of their own which will inform the next steps taken in the field.

I've been thinking about this a lot in the context of open access books (call it OAPEN-UK-itis). We know that the monographs market isn't working. In the last ten years, sales of the average monograph have declined from 2000 to just 200. Authors are concerned that their work isn't reaching its widest possible audience-and to be honest, it probably isn't. Editors have to turn down books that they think important and dearly want to publish because they can't make the sums add up. This is not a system in healthy working order.

Open access is touted as one possible solution to the so-called 'monographs crisis': indeed, this was part of the rationle for setting up the OAPEN-UK project. And certainly a business model which doesn't rely on print sales at upwards of £50 a pop could do a lot to resolve some concerns about the reach of academic texts. But I wonder whether this is enough. Are we really engaging with the core reasons that academic monographs are failing? Or are we just setting ourselves up for another fall?

I think part of the difficulty is that we don't always interrogate and articulate what a monograph is actually for in the humanities and social sciences. Compared to a journal article or a conference paper, it does two main things. First, it communicates an author's research findings, viewpoint and accumulated knowledge through a sustained, lengthy argument. Second, it signals to the scholarly community that the author has reached a certain level of attainment. In some disciplines, it is a prerequisite if you want a job or a promotion.

Now, I would argue that we ought to question both those functions. Do we really believe that monographs are the only, or even best, way to share the findings of academic research? There seems to be a growing debate. One thing I took away from the Open Access Monographs conference last week was a strong sense of the book as an artefact; a codefied and partial (in both senses of the word - incomplete and also highly personal) record of the more fluid and ongoing conversation which constitutes acdemic research.

Secondly: can we justify the book as a sine qua non for employment in academia? Yes, it certainly does take a certain set of skills to write a monograph, but are those the only skills a professional academic needs? Probably not. Is a monograh the only way to display them? Probably not. The importance of the book as a marker of success is a relatively recent phenomenon: a professor of English told me that twenty years ago you could be an eminent researcher in her field without publishing a monograph. As I'm writing this, I'm sitting in a university library cafe listening to a supervisor reminding his PhD student that Foucault and Derrida didn't publish their significant books until they were in their forties or fifties 'years after they finished their PhDs' (I'm not making this up).

But as well as questioning the two functions of the monograph, we ought to question the connection between them. It occurs to me that, just as a book may arrest and 'set in stone' a process of academic research, somehow, the monograph's role as a signal of academic success has ossified the wider conversation about the best way to communicate research outputs in the humanities and social sciences. The book's unquestioned importance in an academic's professional life makes it hard for them to question its role in their intellectual life.

So when we talk about the failure of the monograph market, we need to think beyond failing business models. We need to ask whether the book itself, and the cult that's grown up around it, is failing researchers. Could it be that some monographs are simply very expensive, very time-consuming, 150,000 word insurance policies to prevent job applications from going straight into the waste-paper basket? Could there, in fact, be a better way to communicate certain pieces of research, which scholars are prevented from exploring because they feel they need to write a book in order to support their careers? Conventions have grown up around the book which make it a useful way of judging quality: scholars know, within their own field, which presses maintain the most interesting list or have access to the best peer reviewers. But we must not think that these conventions could not develop, in time, around other ways of communicating research. Just because we don't know, now, how to peer review a collaborative community, doesn't mean we never will.

There's a Samuel Beckett quote which I believe it's almost obligatory to mention when discussing this subject. 'Try again. Fail again. Fail better'. Well, I don't think you can fail better unless you really understand why you failed the first time. Yes, declining library budgets and journal big deals are part of the problem with the monograph market. But it might not just be the business model. Maybe - maybe - another part of the problem is monographs themselves. Let's not set out to solve the monograh crisis without really understanding why it's happening. Let's not create a solution which in five, ten, twenty years' time will fail in exactly the same way. Let's dig a bit deeper, take some genuine next steps towards sustainable scholarly communications in the humanities and social sciences, and make sure that whoever comes after us has some really interesting failures to play around with. That's success.

Wednesday, 3 July 2013

In praise of diversity

Those of you who follow my Twitter feed will have seen that in the past couple of weeks I've been to two really excellent conferences. The first was the OAI8 conference organsied by CERN; the second was the Open Access Monographs conference at the British Library, organised by OAPEN and Jisc Collections.

This post was actually already half-written, in a sort of breathlessly-excitable tone, after the first conference. But my inefficiency turns out to have been a blessing in disguise. Despite their very different subject matters and quite different audiences, the two conferences got me thinking about a similar set of issues.

My standout presentation from OAI8, a conference officially on 'open access innovations' but actually covering a lot more than that, was Kevin Ashley. Kevin was talking about 'quality' in data sharing. He argued that, at present, quality is defined by the data management profession, and usually means clean, corrected, perfectly beautiful data. But, he asked, is this a definition of 'quality' that everyone would agree with? What abou the researcher whose interest lies in seeing the original, messy data? What about the one who wants it as soon as possible, not in the fourteen months that one data archive proudly cited as evidence of their very thorough work?

Kevin argues that our notions of quality currently privilege one kind of user, the one who wants reuse-ready data and doesn't mind how long it takes to get it, but ignores many others. We need more diversity in our understanding of data 'quality' to give researchers the choice that they need.

And diversity, for me, really encapsulates the spirit of the OA monographs conference. Cameron Neylon, in a mind-numbingly brave closing keynote which involved 60 randomised slides and no script (no script!!), mentioned how nice it was to be at a conference where speakers turned to each other on stage and said 'let's make sure we talk about this afterwards'. And he was right. The feeling at the conference was collegiate; lots of different ways of 'doing' open access monographs and a sense that there was space for everyone to have a go.

There were some models that were new to me: I was particularly interested in the EU-funded Agora project which is experimenting with Green open access monographs in philosophy and seems to be turning out some interesting data. It was also good to hear a bit more about developing models like OpenEdition, a French infrastructure project for OA books, and the Open Library of the Humanities.

But the best thing, as Cameron said, was a feeling that everyone wanted to engage with each other; that competition, if there was any, was friendly, and that we're expecting to see the diversity of traditional book publishing recreated in an open access world. Books are heterogenous, and authors want different things from their publishers. Some will want to experiment with open peer review and new ways of exposing the process of writing a book (as Kathleen Fitzpatrick elegantly discussed); others will be keen to integrate different types of content such as data, video, text, or even objects that the reader can reassemble using a 3D printer (Cameron again!). Others are going to be more concerned about brand and reach, and will feel that a traditional publisher with an open access option is the best route for them.

So we need to make sure that the open access monograph environment serves all these users. Let's not, as Kevin Ashley discussed at OAI8, trap ourselves by prioritising a single type of author, or a single set of needs. Let's preserve this diversity, and deal with the issues that it raises (such as visibility and trust of newer publishers) with new solutions (such as the Directory of Open Access Books).

A final anecdote from OAI8. Carlos Rossel is the Publisher at the World Bank, and over the last year he has implemented an open access policy for all publications funded by the Bank (there are a lot). He didn't want to: it was a directive from higher up which he couldn't ignore. But in the first year they have had over ten thousand deposits into the Bank's repository, and over a million downloads. 'I came to open access reluctantly,' he said, 'but I was proven wrong and the people moving towards openness were proven right.'

That's the journey we need to take researchers on in relation to OA books. There's a lot of scepticism out there; many researchers in HSS still see OA as something to be afraid of. To get them to travel alongside us, we need to give them a means of transport which feels safe, comfortable and built for them - in other words, we need to preserve the diversity which will give them choice.

Friday, 11 January 2013

Peerless honesty

Alright, hands up. Who’s been consoling themselves after the return to the office (happy New Year, by the by!) with the Twitter hashtags, overlyhonestmethods and overlyhonestreivews? If you haven’t, may I recommend that you do so at once.

Are you back? Ok. The overlyhonestmethods tag, started by one bored postdoc with the post-Christmas blues, has grown, very fast, into a sort of mass online confessional. I’m sure, like me, you winced with recognition at some of the more candid reasons behind methodological choices - the sampling ones were particularly close to the bone when I think about my Masters dissertation. But I’m proposing to focus in this post on the overlyhonestreviews tag, and (gulp) put down some of my thoughts about peer review.

Not to pat my own back, but I think this is pretty brave of me. As the scholarly communications system swirls with new ideas - business models, platforms, what it is that we even publish - peer review seems to be something of a lodestone. A fixed point that we can always return to, a characteristic of scholarly communications that simply cannot change. Researchers, in particular, are very keen on peer review, and in many of the open-access-type discussions I've been attending recently this has been a big theme. Peer review is non-negotiable.

And I think that the overlyhonestreviews tag on Twitter suggests some reasons why this uncritical approach mightn't be the right one. Let's start by saying that these 140-character reviews are funny because they contain a little grain of truth. I imagine we've all read papers (unpublished, and published too) where we've found the methods peculiar, the literature review partial at best, and struggled to see the connection between the modest results obtained and the spectacular conclusions drawn. These author-side problems are all highlighted in the stream of comments, with considerable humour in some cases. And they show peer review working as it should.

But there's another set of comments, with (I suspect) equal levels of truth, which is much more of a concern. A few media stories on overlyhonestmethods have picked up the theme that 'scientists are human too', and as we know, humans are not Made of Nice. This comes out in a subset of joke reviews which reveal the pettier aspects of the current system. You know what I'm talking about here. 'It's pretty good but you haven't cited me - REJECT'. 'Oh dear. You appear to have pre-empted the work I'm doing at the moment, making it virtually unpublishable - REJECT'. 'I met you once at a conference and you were annoying - REJECT'. 'This method was devised after 1990 so I've never heard of it - REJECT'. Or even 'This is a bad piece of science but you cite me and that's good for my h-index - ACCEPT!'. Others reference the power relations in research: 'This paper's so good that if I let it through you might become a threat to me - REJECT'. 'This is a pile of rubbish but I'm a first-time-reviewer who's slightly scared of you, so I'll ACCEPT'.

I'm not suggesting that any of this is routine, or even particularly common in scholarly publishing. There's a lot of exaggeration for comic effect. Nonetheless, that niggling grain of truth remains. Particularly when you consider that the outcome is most likely not an outright 'reject' but a round of time-consuming revisions to include the reviewer's extensive back catalogue, or to explain the method in words of one syllable to readers who have been using it for years. That's not a system which is functioning efficiently.

Of course, we're aware that peer review isn't perfect. Studies have looked at whether specific groups - young people, women, ethnic minorities - are unfairly disadvantaged. And there's been a long debate around the relative merits of single-blind, double-blind and open peer review. But I'm not aware of any studies which look at the underlying process of peer review - whether the attributes identified in the twitter stream are nonexistent, niche or widespread, and what that means for the content that makes up, broadly speaking, the scholarly literature.

You might argue that there are some experiments underway which could help move things on. Initiatives such as PLoS One, the altmetrics movement and a current study at the University of Tennessee looking at how researchers assign authority to scholarly outputs are all working within the same territory as peer review. To that, I would say that these experiments are primarily about assessing significance rather than quality. Most are working with outputs that either haven't been quality assured at all (blogs, for example etc) or those which have been formally published (journal articles) and have therefore already been through peer review to check they're technically sound. Indeed, this is the key aim of PLoS One, and the area where altmetrics probably have the strongest claim to usefulness (for now, at least).

So, we must assume that there are some - not many, but some - articles which are technically sound, but which don't ever make it through the peer review process to post-publication judgement because they've come up against the barriers outlined in the stream of tweets. And, that there are some - fewer, probably - articles which aren't technically sound but do make it through the peer review process, to be judged by readers only on their significance while the quality is taken as read.

Perhaps this isn't a great revelation. But my strong impression, on reading what started out as a lighthearted distraction from work, is that we really should look beyond the jokes to talk about this seriously. As governments, funders and researchers themselves press for changes in the way research is communicated - including sharing new types of work such as datasets - we should seize the opportunity to ask whether peer review is the only - or, indeed, the best - way to ensure that scholarly communication works effectively. I'm not saying that we have the answers. But we never will, unless we overcome our peer review taboo and start to ask the question.