## Does maintaining supply of water and sanitation in IDP sites after the relief phase encourage people to stay in the sites? Does cutting them encourage people to leave?

This analysis is an analysis of publicly available data -  the 11 approximately bi-monthly IOM Haiti DTM masterlists covering the period December 2010 to August 2012 – which list basic statistics on each Haiti IDP site for each time-point. I conducted it during a mission to Haiti. It attempts to answer a question which was being widely discussed at the time: Does maintaining supply of water and sanitation in IDP sites after the relief phase encourage people to stay in the sites? Does cutting them encourage people to leave? Here is the link.

## Rstudio starts to codefold markdown

Rstudio is a great tool for working with R and R scripts. And Markdown is a great way to write even complex, reproducible documents in plain text. So they make a great combination. BUT:

and AFAIK it didn’t understand the difference between different heading levels.

And now, all of a sudden, it just works! I am on version 0.98.243. So you don’t have to type —– after each heading, and you can press Alt L to fold a second-level heading and it will close all the text below it down to the next second or first-level heading.

All we need now is a keystroke to open or close everything up to a given level, and to move folded chunks around. I would *so* like to not keep using Microsoft Word for outlining; it gives me heartburn every time I have to admit that it is *still* the best document outliner around.

## No more multiple dropboxes

I wrote before about using multiple dropboxes on the same computer. I found it very useful – one for work, one for family, and one for freelancing. Unfortunately it isn’t possible any more, whichever OS you are on. Big pity. The old instructions on how to do it are still up on the Dropbox wiki so I wrote to support and got a clear answer:

Currently you cannot have more than one Dropbox account per user account on a single computer. However, it is possible for each user to have their own Dropbox account.

If you need access to two different Dropboxes on the same computer the best option is to create a second user account within your operating system for the second Dropbox account. However, you would need to log out of one user account and log in to the other account to switch Dropbox accounts. This may or may not be convenient for your needs.

It just might be that there is some kind of hack/workaround but as far as I can see this is a deliberate change by Dropbox Inc. Hard to understand, because they are going to lose 200 USD/year from me as I will have to merge three paid Dropboxes into one.

## Social Capital article published at last

International Journal of Internet Science
Volume 8, Issue 1

Social Capital and Pro-Social Behavior Online and Offline
Constantin M. Bosancianu, Steve Powell, & Esad Bratović
Central European University, Budapest, Hungary; proMENTE Social Research, Sarajevo, Bosnia and Herzegovina

Abstract: Pro-social behavior, one of the defining characteristics of humans as social beings, plays a vital role in maintaining social bonds and in making social transactions possible. The questions which drive this study are whether there is any association between pro-social behavior or social capital online and offline, and whether we can see different patterns of effects of pro-social behavior on social capital online and offline. Using data obtained through an online survey of 1912 Internet users in Bosnia and Herzegovina, Croatia and Serbia, this study finds that pro-social behavior online and offline are closely related, as are social capital online and offline. In terms of the effects of pro-social behavior, however, we find that whereas online behavior has a stronger impact on online social capital than on offline social capital, the reverse does not hold: offline pro-social behavior has roughly the same impact on both types of social capital. Finally, online pro-social behavior is associated with a greater level of bridging offline social capital, suggesting positive spill-over effects from online acts of kindness. Our results inform future studies that wish to focus on pro-social behavior regarding the dual spheres in which it is present, as well as about the limited cross-over effects that exist between the two.

Keywords: Pro-social behavior, offline, online, social capital, community

## Does it make sense to try to measure progress on the highest levels of a logframe?

Another interesting discussion on the M&ENews mailing list – does it make sense to try to measure progress on the highest levels of a logframe?

A couple of opinions -

True, it is often hard to attribute changes in “higher-level” items to our project. But we will often still want to monitor changes if we can. So if a school has introduced a program to reduce truancy, the staff will surely want to monitor truancy level, even if they know it is only to a certain extent within their power to influence. In other words, indicators can be for actual monitoring (in the ordinary sense of the word) as well as for attribution. I guess this is was the original idea behind the M of M&E, which the pseudoscience of logframes sometimes obscures – the idea of keeping an eye on progress towards an important outcome. This “keeping an eye on progress towards the outcome” is about a lot more than just comparing baseline and endline scores and is, I think, one of the marks of real, effective management. It is something we do instinctively when we really care about reaching the outcome. And we do it just the same whether we have 100% control or only 5% control over the outcome, i.e. regardless of whether success or failure can be totally attributed to our efforts. Ordinary life outside the world of projects and logframes are full of this natural monitoring – just think of what we have to keep an eye on when we are bringing up children, or running a school or a business, or managing a national bank. So if a program plan or logframe is a living, day-to-day tool to help our project succeed, it will help us visualise what we are trying to reach and give us a handful of key, usually low-tech ways to regularly test what progress we are making and get warning signs if we are getting off track. Anything else in our plan is dead weight or “logframe bloat”.

What does a parent keep an eye on to tell if a child is getting too tired to finish a task? Now those are indicators and that is monitoring. Who said you need a PhD for M&E? Parents don’t usually even have program plans and yet are experts at using complex, sensory data to monitor and manage progress towards outcomes.

So – I would say that the primary function of a logframe or program plan should be to help us with real-life, daily monitoring of progress, – and the attribution function is secondary and is derived from the primary function.

In practice, we try to judge performance even where attribution is difficult because control over the outcome is less than 100%. We are always judging the performance of politicians, school directors, managers and so on, though we are quite well aware that their successes and failures are only partly to be attributed to their efforts. We take note of the baseline and endline data but we hopefully don’t take too much notice of it. And interestingly, we rarely complain that we only have a sample of one or that the counterfactuals are insufficient.

One more thing – it is true that changes in “higher levels” sometimes involve changes in attitudes, beliefs etc (which are supposed to be hard to measure objectively, giving us a reason not to monitor progress on them). But it is a myth that they always do. To repeat the same example, reducing truancy is an important behavioural outcome which might have several layers below it in a school action plan. But it is not hard to measure. The MDGs are, I guess, “high-level outcomes” (to continue with this not very helpful language of hierarchical levels) but they are very concrete and not particularly hard to measure.

## Omni test for statistical significance

In survey research, our datasets nearly always comprise variables with mixed measurement levels – in particular, nominal, ordinal and continuous, or in R-speak, unordered factors, ordered factors and numeric variables. Sometimes it is useful to be able to do blanket tests of one set of variables (possibly of mixed level) against another without having to worry about which test to use.

For this we have developed an omni function which can do binary tests of significance between pairs of variables, either of which can be any of the three aforementioned levels. We have also generalised the function to include other kinds of variables such as lat/lon for GIS applications, and to distinguish between integer and continuous variables, but the version I am posting below sticks to just those three levels. Certainly one can argue about which tests are applicable in which precise case, but at least the principle might be interesting to my dear readeRs.

I will write another post soon about using this function in order to display heatmaps of significance levels.

The function returns the p value, together with attributes for the sample size and test used. It is also convenient to test if the two variables are literally the same variable. You can do this by providing your variables with an attribute “varnames”. So if attr(x,”varnames”) is the same as attr(y,”varnames”) then the function returns 1 (instead of 0, which would be the result if you hadn’t provided those attributes).

 {r} #some helper functions classer=function(x){ y=class(x)[1] s=switch(EXPR=y,"integer"="con","factor"="nom","character"="str","numeric"="con","ordered"="ord","logical"="log") s } xc=function(stri,sepp=" ") (strsplit(stri, sepp)[[1]]) #so you can type xc("red blue green") instead of c("red","blue","green") #now comes the main function xtabstat=function(v1=xy,v2,level1="nom",level2="nom",spv=F,...){ p=1 if(length(unique(v1))<2 | length(unique(v2))<2) p else { havevarnames=!is.null(attr(v1,"varnames")) & !is.null(attr(v2,"varnames")) notsame=T; if (havevarnames)notsame=attr(v1,"varnames")!=attr(v2,"varnames") if(!havevarnames) warning(paste("If you don't provide varnames I can't be sure the two variables are not identical"),attr(v2,"label"),attr(v2,"label")) if(notsame | !havevarnames){ if(min(length(which(table(v1)!=0)),length(which(table(v2)!=0)))>1) { level1=classer(v1) level2=classer(v2) if(level1=="str") level1="nom" if(level2=="str") level2="nom" # if(attr(v1,"ncol")==2 & attr(v2,"ncol")==9) if(level1 %in% xc("nom geo") & level2 %in% xc("nom geo")) {if(class(try(chisq.test(v1,v2,...)))!="try-error"){ pp=chisq.test(v1,factor(v2),simulate.p.value=spv,...) p=pp$p.value;attr(p,"method")="Chi-squared test" attr(p,"estimate")=pp$statistic }else p=1 } else if(level1=="ord" & level2 %in% xc("nom geo")) {if(class(try(kruskal.test(v1,factor(v2),...)))!="try-error"){ pp=kruskal.test(v1,factor(v2),...) ppp<<-pp p=pp$p.value attr(p,"estimate")=pp$statistic } else { p=1 attr(p,"method")="Kruskal test" } } else if(level1 %in% xc("nom geo") & level2=="ord") {if(class(try(kruskal.test(v2,factor(v1),...)))!="try-error"){ pp=kruskal.test(v2,factor(v1),...) p=pp$p.value;attr(p,"estimate")=pp$statistic } else { p=1 attr(p,"method")="Kruskal test" } } else if((level1=="ord" & level2=="ord") | (level1=="ord" & level2=="con") | (level1=="con" & level2=="ord")) {if(class(try(cor.test(as.numeric(v1),as.numeric (v2),method="spearman",...)))!="try-error") {pp=cor.test(as.numeric(v1),as.numeric (v2),method="spearman",...);p=pp$p.value;attr(p,"method")="Spearman rho.";attr(p,"estimate")=pp$estimate} else cat("not enough finite observations for Spearman")} else if( level1=="con" & level2 %in% xc("nom geo")) { if(class(try(anova(lm(as.numeric(v1)~v2))))!="try-error"){ pp=anova(lm(as.numeric(v1)~v2));p=pp$"Pr(>F)"[1];attr(p,"estimate")=pp["F value"];attr(p,"method")="ANOVA F" }else p=1} else if( level1 %in% xc("nom geo") & level2 %in% xc("con")) { if(class(try(anova(lm(as.numeric(v2)~v1))))!="try-error"){ pp=anova(lm(as.numeric(v2)~v1));p=pp$"Pr(>F)"[1];attr(p,"estimate")=pp["F value"];attr(p,"method")="ANOVA F" }else p=1} else if( level1=="con" & level2 %in% xc("ord")) { if(class(try(anova(lm(as.numeric(v1)~v2))))!="try-error"){ pp=anova(lm(as.numeric(v1)~v2));p=pp$"Pr(>F)"[1];attr(p,"estimate")=pp["F value"];attr(p,"method")="ANOVA F" }else p=1} else if( level1=="ord" & level2 %in% xc("con")) { if(class(try(anova(lm(as.numeric(v2)~v1))))!="try-error"){ pp=anova(lm(as.numeric(v2)~v1));p=pp$"Pr(>F)"[1];attr(p,"estimate")=pp["F value"];attr(p,"method")="ANOVA F" }else p=1} ##TODO think if these are the best tests else if(level1=="con" & level2=="con") { # ; pp=cor.test(as.numeric(v1),as.numeric(v2)) p=pp$p.value attr(p,"method")="Pearson correlation" attr(p,"estimate")=pp$estimate } # else if(level1=="str" | level2 =="str") stop(P("You are trying to carry out stats tests for a string variable",attr(v1,"varnames")," or ",attr(v2,"varnames"),". You probably want to convert to nominal.")) else {p=1 attr(p,"estimate")=NULL } attr(p,"N")=nrow(na.omit(data.frame(v1,v2))) } } else {p=1;attr(p,"N")=sum(!is.na(v1))} #could put stuff here for single-var analysis if(is.na(p))p=1 p } } ## now let's try this out on a mixed dataset. Load mtcars and convert some vars to ordinal and nominal. mt=mtcars mt$gear=factor(mt$gear,ordered=T) mt$cyl=factor(mt$cyl,ordered=F) s=sapply(mt,function(x){sapply(mt,function(y){ xtabstat(x,y) }) } ) heatmap(s)  

## Building a custom database of country time-series data using Quandl

Encouraged by this post I had another look at quandl for collecting datasets from different agencies. Right now I need to get data for four countries on a couple of dozen indicators.

This graphic is just a quick example with only two indicators of what I am aiming to be able to do.

The process on Quandl at the moment is a bit fiddly:

• there is no search function in the API
• the country codes used are different from agency to agency

So my workflow is this. It isn’t as complicated as it sounds. I have used spreadsheets to store country codes and queries to make it all as re-useable as possible. You can download the spreadsheets here and here.

• edit the csv spreadsheet of the 2-and 3-digit ISO country codes, plus the actual names. Also, WHO for some reasons uses some other codes which I had to paste in by hand. If you find your sources are also using yet other codes, you can add them to the spreadsheet. Put an x in the “enabled” column to mark the countries you want to use.
• search manually at quandl for interesting queries and add them to the other csv spreadsheet, replacing the country code with %s, again putting an x in the “enabled” column for the queries you want, adding a human-readable title in the “title” column if you want and putting “alpha2″ or “alpha3″ etc in the country_sign column to mark which kind of country code is being used.
• run the script below.

authcode=”yourAuthCodeFromQuandl”

library(Quandl)

cou=list()

queries=queries[queries$enabled!="",] codes=read.csv(“countryCodes.csv”) codesE=codes[codes$enabled!="",]

for(qq in 1:nrow(queries)){

q=queries$query[qq] for(cc in 1:nrow(codesE)){ co=codesE[cc,queries[qq,"country_sign"]] tex=paste(q,co,sep=”.”) cou[[tex]]=try(Quandl(sprintf(q,co),authcode=authcode),T) if(attributes(cou[[tex]])$class!=”try-error”)cou[[tex]]$Indicator=ifelse(!is.na(queries$title[qq]),queries$title[qq],q) if(attributes(cou[[tex]])$class!=”try-error”)cou[[tex]]$Country=codesE[cc,"name"] }} rr=rbind.fill(cou[sapply(cou,function(x)length(x)>1)]) rr$Date=as.character(rr$Date)rr$Year=as.character(rr$Year) rr$Year=as.Date(ifelse(!is.na(rr$Year),rr$Year,rr$Date)) rr$Value=ifelse(!is.na(rr$Value),rr$Value,rr$Percent) #you might have to do something like this if your queries are returning data in columns with some other label than Value #then try a graphic for demonstration purposes ggplot(data=rr,aes(x=Year,y=Value,group=Country,colour=Country))+geom_point(size=3)+geom_line()+facet_grid(Indicator~.,scales=”free”)+ theme(strip.text.y = theme_text(size = 13, hjust=0,angle = 0))+theme(axis.text.x=element_text(angle=90)) And voila. I wanted to put the spreadsheets as a google spreadsheet but it seems RGoogleDocs is not working for R 3.0. ## Changing figure options mid-chunk (in a loop) using the pander package. I wrote already about changing figure options mid-chunk in reproducible research. This can be important e.g. if you are looping through a dataset to produce a graphic for each variable but the figure width or height need to depend on properties of the variables, e.g. if you are producing histograms and want the figures to be a bit wider when there are more bins. That previous post was about knitr, but at the moment I am using the pander package more than knitr because it makes some things simpler. Changing figure options is a case in point. Here is the output: # Varying widths for graphs in a loop using the pander package ### Results for: mpg Anything you type here will be inside the same paragraph as the figure and so works like a pseudocaption ### Results for: cyl Anything you type here will be inside the same paragraph as the figure and so works like a pseudocaption And here is the code: Varying widths for graphs in a loop using the pander package ================ <% for (varn in names(mtcars[,1:2)) { %><%= var=mtcars[,varn] pandoc.p.return(“”) pandoc.header.return(paste(“Results for: “,varn,”"),3) pandoc.p.return(“”) fac=(100*log(length(unique(var))))#calculate some factor to ensure somewhat wider graphs for more bins %><% evals.option(“width”,50+fac) %><%= #have to break out of the BRCODES to change the height options for the next chunk qplot(var,geom=”bar”)+xlab(varn)%> </br>Anything you type here will be inside the same paragraph as the figure and so works like a pseudocaption<% # coord_flip() %><% } %> Oh, and to make it all happen: library(pander) library(ggplot) Pandoc.brew(convert=”html”,output=”loop”,”nameOfFileContainingTheAboveScript.R”) ## Haiti: Request for Qualifications for research teams to conduct an impact evaluation of the Integrated Neighborhood Approach (INA) Very proud and happy to see that this idea, which we developed while I was in Haiti with the IFRC, is nearing fruition: 3ie will be issuing a Request for Qualifications for research teams to conduct an impact evaluation of the Integrated Neighborhood Approach (INA) which aims to build resilient urban communities which are safer, healthier and living in an improved habitat. ## knitr: Changing chunk options like fig.height programmatically, mid-chunk Knitr is a great tool for doing reproducible research. You can produce all kinds of output inside a single knitr chunk, e.g. you can write a loop to produce lots of figures or tables. The only catch is if you want your figures to have differing captions, heights, etc (and usually you do). The standard way is to write a separate chunk for each figure and set the options in the chunk header. So you can’t produce several differing figures from inside one chunk. Or can you? This works for me, based on hints from Yui on github.  \documentclass{article}  \begin{document} <<>>= opts_knit$set(progress = F, verbose = F) opts_chunk\$set(comment=NA, warning=FALSE,message=FALSE,fig.width=6, echo=F) kexpand=function(fh,cap){ cat(knit( text=knit_expand(text= "<<{{cap}},fig.height={{fh}},fig.cap={{'cap'}}>>=\n .q\n @" ) ))} @ <>= library(ggplot2) .q=qplot(1:10) kexpand(2,"one") .q=qplot(1:20) kexpand(8,"two") 

@ \end{document} 
Warning: wordpress is eating some of my <<>>. Make sure your chunks are formed with the usual chunk syntax.

So one key thing is to set progress and verbose to F otherwise they destroy the output. Then the little function kexpand expands an inline template which is typed as text as part of the function. Then you can define your plot as .q and your caption as cap, and your heights etc. You could adapt the function to control other options. Strangely, .q doesn’t have to be an argument for the function, you can just set it in the current environment and it gets picked up by the function anyway. Don’t know if this is good practice or why it works but it does.

I just posted this same trick in response to a question on stackoverflow. Let’s see if it gets accepted.

Update: added argument for figure height.