Fri 03 October 2014

Filed under nerdvana

Tags reproducibleResearch opensource zotero

1 The problem

Every student, researcher and scientist has to read and make notes and highlight or scribble on texts. In long-gone days I spent ages working on Uni literature and making summary charts of what I read, and I really wish I still had some of those notes at my fingertips.

Then came the electronic age. I pretty soon decided that the only future-proof way of marking up electronic texts is do it to the actual PDF files themselves. In 20 years I should still be able to open up a PDF from yesterday and find the scribbles I made and thoughts I had.

But how to know which of the zillions of PDFs on my hard drive or in my portion of the cloud I have actually made notes on, without opening them up?

2 The wrong solution: special software

Sure, the excellent zotfile plug-in for the excellent zotero can extract my notes but I can only see them in zotero, not on my hard drive. At some point you can be sure the actual PDFs will get detached from zotero or zotero will get bought by google and put to death, or whatever. And don’t get me started on mendley or even Endnote.

3 The right solution: use my little script

So I wrote my very first python script which you can find at github. You need to have python installed. Make the script executable and save it at the root of the folder where your subfolders with PDFs reside. When you run it, it finds and extracts the annotations or highlights from every PDF file within that folder and all its subfolders. So if your file myfile.pdf has annotations or highlights, it saves a file myfile.annotations.txt in the same folder. The new text file has the same modification date and the same name so it should get listed together with its big sister in your file browser, whether your files are sorted by modification date or by title. This means you can see at a glance if you have already marked up a PDF, and you can see all the notes (together with the approximate page numbers) at a glance without having to open the PDF and click through it.

Include the script in a cron job (or a scheduled task on Windows) to run every 15 minutes or so (it only takes a few seconds for the 34000 files in my folder, of which quite a few thousand are PDFs, of which about 250 have annotations) so you don’t have to remember to run it.

I would be interested to hear if this works on Windows and Mac OS too (I am on Linux).

For pure happiness and efficiency, combine this with my other tips on mirroring your zotero collections on your filesystem. So if, say, you are syncing your PDFs with your phone or tablet, you will also see the same little text files next to your PDFs on the other device too, so you will know which ones you have read and which ones you haven’t.

Pretty nifty.

4 Extra feature: extracting to-dos

When I am reading and commenting on PDFs I often realise I need to carry out further tasks like, say, look up a reference mentioned in the document. So I could switch to some other program (or a sticky note) and write down the to-do and copy the information I need to do the task as well as perhaps the name of the PDF etc etc.

If this is part of your workflow too, you will want to know about an extra feature of my script. If you write a short pop-up note in your PDF and include just the characters “xk” (you can change this in the script) anywhere in the note, then the when you run the script you should find a folder called xk in the root of your drive, full of little text files, one for each pop-up note which has “xk” somewhere in it.

So if for example you make a note

remember to ask the Professor for more info on the project xk

on p. 6 of document “mydoc.pdf”, you should find a corresponding note in the folder xk named something like this:

remember to ask the Professor for more info on the project xk - mydoc - p. 6.txt

Comment

Fri 03 October 2014

Filed under nerdvana

Tags reproducibleResearch opensource zotero

1 The problem

Zotero is a really great tool for organising your scientific and professional documents and their citations, and keeping documents and citations together.

Now usually we want to store documents in some kind of hierarchical or tagged structure. Up to now it has been possible to do that ...

Read More

Wed 10 September 2014

Filed under nerdvana

Tags reproducibleResearch opensource

Wordpress is a great blogging engine. But I spend almost all my time with plaintext files in markdown. Whatever I am working on, from my CV to statistical reports, I have it here on my hard disk and I can compile it as pdf or Word or whatever I feel ...

Read More

Sun 10 August 2014

Filed under resilience

Tags evaluation development

This article is work in progress! Feel free to add a comment. There should be a pdf at this link.

1 Background and motivation…

1.1 Why should we care about resilience?

1.1.1 Self-healing systems - This is what we want!

After the terrible earthquake in Haiti in 2010 ...

Read More

Wed 11 June 2014

Filed under odds and ends

Tags ggplot R reproducibleResearch

Bar charts and histograms are easily to understand. I often write for non-specialist audiences so I tend to use them a lot. People like percentages too, so a bar chart with counts on the y axis but percentage labels is a useful thing to be able to produce.

But how ...

Read More

Tue 10 June 2014

Filed under nerdvana

Tags opensource tech ubuntu

I quite often get asked (and ask myself) what is the best GANTT software for basic use.

Recently we have been comparing a few solutions so here are a couple of quick tips.

If you have a license for Microsoft Project I would use that, though it is a bit ...

Read More

Mon 28 April 2014

Filed under nerdvana

Tags R reproducibleResearch

Screenshot from 2014-04-28 09:13:14

If you love knitr and rstudio and use them to produce long reports, you probably know that you can produce a table of contents in your html (and pdf) documents. In the newer rstudio (Version 0.98.801 or later) you do it by requesting a toc in the doc ...

Read More

Thu 06 March 2014

Filed under total_speculation

Tags future psychology socialCapital society

Is the free-text search box the defining invention of the last twenty years? I think it probably is. Now more than half the entire Western world (and a lot of the rest of it) can find the answer to more or less any text-based question that occurs to them within ...

Read More

Thu 12 December 2013

Filed under nerdvana

Tags reproducibleResearch zotero

Well, for now I have regretfully given up on docear for writing papers - although it fantastic to be able to mindmap your ideas and citations, it still leads you into a dead-end when you want to actually get your paper finished - sadly, the export functions are not very pretty and ...

Read More

Tue 10 December 2013

Filed under odds and ends

Tags Bosnia-Herzegovina ptsd

10yearsAfter

Rita Rosner presented our 10 years after study in Boston at ISTSS this summer.

Read More

Wed 02 October 2013

Filed under social_research

Tags development evaluation reproducibleResearch

Does maintaining supply of water and sanitation in IDP sites after the relief phase encourage people to stay in the sites? Does cutting them encourage people to leave?

This analysis is an analysis of publicly available data -  the 11 approximately bi-monthly IOM Haiti DTM masterlists covering the period December 2010 ...

Read More

Mon 16 September 2013

Filed under nerdvana

Tags R reproducibleResearch

Rstudiois a great tool for working with R and R scripts. And Markdown is a great way to write even complex, reproducible documents in plain text. So they make a great combination. BUT:

before when writing markdown in rstudio, you had to write “—-” after your headings to get it ...

Read More

socialdatablog © Steve Powell Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More