I was recently made aware of a competition being run by Naturejobs for students and ECRs (http://blogs.nature.com/naturejobs/2016/07/18/scidata16-publishing-better-science-through-better-data-writing-competition/?WT.mc_id=TWT_SCIDATA_1607_WRITINGCOMP_OA), to write a short piece about, in a nutshell, good practice in data management. One of the questions in particular stoked my interest: “how should early career researchers engage with open science?”.
After consideration, I decided not to enter, for three reasons: partly because I’m probably too busy to update my CV by the deadline for entry, partly because there’s no way I could limit myself to only 600 words on this important topic [edit: final count 1480 words!], but mostly because the piece I’d like to write could end up being too critical of the model of scientific publishing that is entrenched by journals like Nature. Nonetheless, I was sufficiently enthused that I decided to write the piece anyway, and publish it here.
How [and why] should early career researchers engage with open science?
To me, “open science” is a problematic term. That’s because it implies that this way of ‘doing science’ is in some way different from the norm – which would presumably be ‘closed science’. That is fundamentally unscientific.
I remember as a 10-year-old getting one of my earliest experiences of scientific experimentation in the classroom (I think it was the lemon battery experiment!). I’m sure the physics of the experiment were far and away beyond the comprehension of my classmates and I – what we were really being taught was the scientific method, and specifically how to write up an experiment. “Put enough detail into your materials and methods that anybody could repeat your experiment and get the same result”, we were taught.
Of course, in modern, hi-tech science, we have to adjust that to read “put enough detail in that anybody with access to the necessary equipment could repeat your experiment”. Not everybody will have access to a Next Generation Sequencing platform, or a powerful computer cluster, or a tract of farmland they can manipulate crops on – and that’s inevitable, and it’s fine. But one thing that should not be, that should never have been counted as ‘the necessary equipment’ is an institutional library with enough purchasing power to buy journal subscriptions. Now, I know that some fantastic tools like sci-hub have been set up to allow people who don’t have that equipment to access papers in subscription journals, but this should not be treated as an excuse to take the easy option. The burden, in my opinion, is with the researcher to make sure their audience doesn’t need to resort to such methods (which, I have it on the best authority, are technically illegal to use!). If your science is to actually be real science, it has to be openly accessible.
Given all that, here are my tips for ECRs on how to make your science open. These are all things that I didn’t know when I started my PhD, haven’t always followed to the letter during my PhD, but now consider to be essential to any papers I publish in the future.
1. Make your manuscripts Open Access
Let’s start simple. There is really no reason not to make your papers Open Access (OA) when you publish them. I think most people are now aware of the two tiers of OA that are available: “Gold OA”, where you pay to make your paper accessible from day one, and “Green OA”, where the journal paywalls your paper and you are allowed to self-archive (make a copy openly available online) after an embargo date – usually six months or a year.
Clearly if at all possible, Gold OA is the preferred option, but many people are offput by the high cost of the Article Processing Charge (APC) – for my recent article in Global Change Biology, this was USD $4000. One common misconception that I certainly had when I began my PhD was that I was personally responsible for this – not in the sense that it would come out of my personal bank account, but rather that it would come from my project budget. In more and more cases, this isn’t true. If you’re in the UK and your project is RCUK-funded, your university/institution pretty much has to pay your APCs – this is because they are given money by RCUK specifically to cover APCs for RCUK-funded projects! Even if you’re not RCUK-funded, it’s always worth asking your university to cough up (most university libraries now have a designated OA Officer) – they may have some of this money left over at the end of the year, or have designated some additional money for internal projects. If you’re not at the same university any more, it’s again worth asking your co-authors (especially your PI) if their university will cover the costs.
If you really, really can’t get your APCs covered, then at the very least you should self-archive as soon as possible. I’m not going to cover this in great detail, but here is a website (declaration of interest: I helped to research an update to this archive in 2014) that will illustrate how doing so can benefit you.
2. Make your scripts Open Access…
This one’s less obvious. Something I’ve regularly spotted since I first started participating in peer-review is a methods section that spends several paragraphs describing every intimate detail of the fieldwork and labwork, and follows it with something like “we analysed the data in R using GLMs”. To be fully repeatable, your analysis needs to be described in as much detail as the rest of your work. You could spend several more paragraphs describing every test you did – or you could archive your R scripts online (if you’re not using R yet, why not?!). If you archive your scripts, and point to them from within the paper, then anybody who isn’t sure what statistics you’ve done can simply follow your scripts through, line by line. Easy!
Github is a great tool for doing so. You can link it directly to RStudio on your computer (though it’s a bit tricky – here’s how) and upload the latest versions of your scripts as frequently or infrequently as you like. If you’re working in a ‘busy’ area and worried about being scooped, there’s nothing wrong with waiting until you’ve published the paper before you upload your scripts. Personally, I like to upload pretty much at the end of every day when I’m working on my data – this is because Git is also a fantastic version control tool, for those times when you change your mind about an edit you made last week, or even just accidentally delete a script!
Managed to delete important R script when tidying up for Github commit. Managed to rescue it, thanks to previous Github commit! #rstats
— Callum Macgregor (@Macgregor_Cal) July 25, 2016
3. …and your data too
Archiving your scripts is an important step, but it’s only of limited use if people can’t actually run the scripts. For that, they need access to your data. There are tons of options for this – many institutions will have their own archives for data (here is the data from that Global Change Biology paper, archived in CEH’s Environmental Information Data Centre), or there are more general options, like Dryad. Shop around and decide which is best for you.
As with scripts, some people will have concerns about being scooped – again, there’s nothing wrong with waiting until you’ve published everything you want to, but read this first. Even if people do reuse your data in their own studies, they are likely to (at worst) cite the study you gathered the data for, or (at best) invite you to be a co-author. So, there are potential benefits from archiving your data that go beyond mere reproducibility – it could actually increase your scientific productivity, with no extra effort!
Now, if you’re going to make both your scripts and your data openly available, it’s important that somebody attempting to use your scripts can follow every step you took. Here’s a set of useful tips for how to manage your data to make this happen.
Tips 1-3 are all things I have done, or am currently doing. This one I hope to try in the future. ArXiv has been in use in physics, maths and related fields since before I was born(!), but really took off in the last 10 years – at the time of writing, 1,177,855 papers are openly available through it. More recently, bioRxiv has been started to provide the same resource to biologists. The idea is simple: once you think your paper is about ready to submit to a journal, you upload it to bioRxiv. At this point it’s given a DOI and can be cited by other authors. People reading your paper on bioRxiv can send you feedback, anonymously if they wish. It’s really no different to peer-review organised by a journal. You can then update your paper, including this feedback, and re-upload the latest version.
Once you’re happy that you’ve ironed out most of the issues people have, you can submit the paper to a journal (it might even be to your credit to point out that it’s already gone through peer-review on bioRxiv – a range of journals now accept one-click submission directly from bioRxiv). It will probably still be peer-reviewed, but the chances are that any problems have already been ironed out, those reviews are therefore more likely to be favourable, and your paper is more likely to be accepted. What’s more, it’s a great way of making your work OA even if you can’t pay the APCs (sidebar – I just wondered what the publishers think of this. Turns out, most of them are OK with it!).
So, if you’re an ECR and you haven’t yet thought about making your science open, try one, or two, or all of these steps. Once you realise that open science is a delight that brings huge benefits, rather than a chore that brings extra work, I guarantee you’ll be converted.