<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Mahesh Poudyal</title>
    <description>Personal homepage of an environmental social scientist.
</description>
    <link>https://poudyal.me/</link>
    <atom:link href="https://poudyal.me/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Mon, 22 Jan 2024 22:59:44 +0000</pubDate>
    <lastBuildDate>Mon, 22 Jan 2024 22:59:44 +0000</lastBuildDate>
    <generator>Jekyll v3.9.3</generator>
    
      <item>
        <title>Moving from PowerPoint to Quarto presentations</title>
        <description>&lt;p&gt;Let me explain why I wanted to password-protect these slides first: primarily because a lot of them are incomplete and/or not updated and since I’ve setup continuous deployment on &lt;a href=&quot;https://netlify.app&quot;&gt;netlify&lt;/a&gt;, I don’t want them to be public; second, these are current and future teaching materials that I have been building from scratch, so I want to restrict public access to them, at least for now.&lt;/p&gt;

&lt;p&gt;I’ve been a big fan of markdown ever since the format came into being and I often take notes and write early versions of my papers in markdown format, especially in Rmarkdown if they also involve data analysis on R as I have described in &lt;a href=&quot;https://poudyal.me/research/2017/07/22/My-writing-workflow/&quot;&gt;one of my earlier posts&lt;/a&gt;. I had earlier tried making slides on RStudio using &lt;a href=&quot;https://bookdown.org/yihui/rmarkdown/xaringan.html&quot;&gt;xaringan&lt;/a&gt; but couldn’t really stick to it. When I came across &lt;a href=&quot;https://quarto.org/docs/presentations/revealjs/&quot;&gt;quarto presentations demo&lt;/a&gt;  last year, I was amazed by its smoothness and the features it provided. So, last summer I started converting my old lecture slides to quarto just to check if I would be able to deliver my teaching without using PowerPoint. Luckily I was able to test it before teaching term started as I had to give an invited talk (35-40 minutes) just before the start of the term this year, and I used presentation slides completely built on quarto with animations, slide transitions and all. It worked really well, and being able to zoom onto a section of the slide was like magic to my audience! So that was that, I decided to completely move all my lecture slides to quarto.&lt;/p&gt;

&lt;p&gt;Quarto presentations are basically static html pages with javascript doing all the fancy stuffs in the background. So, I thought why not setup a website with all my lecture slides which I could just wherever and whenever I needed. I already host my main homepage on github deploying via netlify, so I decided to do the same for my presentations. But I also needed the website to be password-protected so only I had the access. After a bit of search, I came across &lt;a href=&quot;https://github.com/robinmoisson/staticrypt&quot;&gt;staticrypt&lt;/a&gt; which seemed to be exactly what I needed. Better still, I found a &lt;a href=&quot;https://github.com/blairanderson/netlify-plugin-password-protection#readme&quot;&gt;netlify plugin&lt;/a&gt; to implement staticrypt making it even easier to setup the whole thing. So, that is what I did, the result of which you can see at &lt;a href=&quot;https://lectures.poudyal.me&quot;&gt;lectures.poudyal.me&lt;/a&gt;. There are already plenty of great ‘step-by-step’ guides to setup on how to build/host websites using github and netlify, so I’m not going to repeat it here. Once you set that up, you just need three more steps to get quarto to render documents (presentations &amp;amp; any other files, e.g. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index&lt;/code&gt; files) and encrypt them:&lt;/p&gt;

&lt;p&gt;I. Setup a new &lt;em&gt;Environment variable&lt;/em&gt; called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PROTECTED_PASSWORD&lt;/code&gt; within your &lt;em&gt;Site settings&lt;/em&gt; in netlify. This will be used to encrypt/decrypt your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;html&lt;/code&gt; pages.&lt;/p&gt;

&lt;p&gt;II. Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;package.json&lt;/code&gt; file at the root of your repository with following entry:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
    &quot;dependencies&quot;: {
        &quot;@quarto/netlify-plugin-quarto&quot;: &quot;^0.0.5&quot;
    },
    &quot;devDependencies&quot;: {
        &quot;netlify-plugin-password-protection&quot;: &quot;^3.0.2&quot;
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;III. Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;netlify.toml&lt;/code&gt; file at the root of your repository with following entry. See further details at the &lt;a href=&quot;https://github.com/blairanderson/netlify-plugin-password-protection#readme&quot;&gt;netlify password protection page&lt;/a&gt; about &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[plugins.inputs]&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[[plugins]]
    package = &quot;@quarto/netlify-plugin-quarto&quot;
[[plugins]]
    package = &quot;netlify-plugin-password-protection&quot;

    [plugins.inputs]
        directoryFilter = [&quot;!node_modules&quot;]
        title = &quot;Protected Page&quot;
        instructions = &quot;Enter your passphrase&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After that, push everything to your github repository&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; and deploy on netlify. everything should work just as required. Some screenshots below on how my website with lecture materials looks like now.&lt;/p&gt;

&lt;p&gt;Image 1: &lt;strong&gt;Encrypted landing page&lt;/strong&gt; &lt;br /&gt;
&lt;img src=&quot;https://poudyal.me/assets/img/ppslides00.png&quot; alt=&quot;Encrypted landing page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Image 2: &lt;strong&gt;Once I enter the passphrase above, I get here.&lt;/strong&gt;  &lt;br /&gt;
&lt;img src=&quot;https://poudyal.me/assets/img/ppslides01.png&quot; alt=&quot;Decrypted landing page.&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Image 3: &lt;strong&gt;Title page of one of my quarto presentation document.&lt;/strong&gt;  &lt;br /&gt;
&lt;img src=&quot;https://poudyal.me/assets/img/ppslides02.png&quot; alt=&quot;Screenshot of quarto presentation title slide&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Next step for me is to use quarto to build all my websites - currently I use hugo.&lt;/p&gt;

&lt;hr /&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Make sure it is a &lt;strong&gt;Private&lt;/strong&gt; repo, otherwise password protection is pointless. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sat, 19 Nov 2022 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/teaching/2022/11/19/Quarto-slides-with-password-protection/</link>
        <guid isPermaLink="true">https://poudyal.me/teaching/2022/11/19/Quarto-slides-with-password-protection/</guid>
        
        <category>teaching</category>
        
        <category>coding</category>
        
        
        <category>teaching</category>
        
      </item>
    
      <item>
        <title>Moving from twitter to mastodon</title>
        <description>&lt;p&gt;I’m on mastodon — on the &lt;a href=&quot;https://fediscience.org/@mpoudyal&quot;&gt;fediscience.org&lt;/a&gt; instance to be precise. I’ve been fairly inactive on twitter because I’ve not liked what it has become for a while now, my followers would have noticed that from my general lack of engagement on that platform for the last 2-3 years.&lt;/p&gt;

&lt;p&gt;This is a big deal for me given I’ve been on the platform since 2007. I never got into facebook despite joining the platform quite early on, and completely left that platform (and its associated ones, &lt;em&gt;instagram&lt;/em&gt; and &lt;em&gt;WhatsApp&lt;/em&gt;) some 5 years ago. So, twitter has been my main social media platform for the best part of 15 years! I liked twitter from the very beginning.&lt;/p&gt;

&lt;p&gt;In fact the original 140 characters was perfect for me because during most of my first two years on the platform, I used to send sms messages to a specific UK number from my fieldwork sites in Northern Ghana to post my tweets. In those days, each SMS was limited to 160 characters so each tweet would cost the price of a single SMS, which I was happy to pay given the service it was providing me — helping me stay in touch with the outer world via my tweets from the field which often were completely cut-off from the outside world. Occasionally, on one of my fieldwork sites, I even had to climb up a tree to get good enough mobile signal (mostly GPRS, and if lucky &lt;em&gt;edge&lt;/em&gt; network — those old enough to use mobile internet before 3G would remember!). I had a &lt;a href=&quot;https://en.wikipedia.org/wiki/Nokia_E50&quot;&gt;Nokia E50&lt;/a&gt; phone back then, which I loved — I still think it is one of the best mobile phones I’ve used. So I tweeted using my reliable Nokia phone for a number of years, and when I got my first &lt;a href=&quot;https://en.wikipedia.org/wiki/Amazon_Kindle#Kindle_Keyboard&quot;&gt;Kindle Keyboard with 3G&lt;/a&gt; it had an experimental browser feature, which I used to tweet as well.&lt;/p&gt;

&lt;p&gt;So all of these fond memories of using Twitter in early years are the ones I would cherish the most. Interestingly, a week or so being on fediverse feels quite similar in many ways, looking for people to follow, seeing others finding you on the platform, and just having a clean timeline with no ads and largely interesting toots. I really hope this will continue and become even better. I have no intention of going back to twitter now, although I’ll keep my account there active for now so that people know where to find me — my name and link on my profile should lead them &lt;a href=&quot;https://fediscience.org/@mpoudyal&quot;&gt;here&lt;/a&gt; 😄&lt;/p&gt;
</description>
        <pubDate>Sat, 05 Nov 2022 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/allelse/2022/11/05/Fediverse/</link>
        <guid isPermaLink="true">https://poudyal.me/allelse/2022/11/05/Fediverse/</guid>
        
        <category>social</category>
        
        
        <category>allelse</category>
        
      </item>
    
      <item>
        <title>Cartographic desktop backgrounds</title>
        <description>&lt;p&gt;Inspired by &lt;a href=&quot;https://twitter.com/simongerman600/status/1342939504865386496?s=20&quot;&gt;this tweet&lt;/a&gt; and &lt;a href=&quot;https://anvaka.github.io/city-roads/&quot;&gt;this tool&lt;/a&gt; that the tweet referred to, I tried to make my own cartographic desktop background for one of my monitors — for the second monitor I’m simply using the one I got for &lt;a href=&quot;https://anvaka.github.io/city-roads/?q=London&amp;amp;areaId=3600065606&quot;&gt;London&lt;/a&gt; using the above tool, which on my desktop looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/london_bg.png&quot; alt=&quot;London map&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Below, I outline the steps in creating similar desktop background as above highlighting all the roads within a particular area but with a base map (e.g. terrain) instead of a plain background. I did one for Kathmandu and surrounding areas in Nepal in R using &lt;a href=&quot;https://cran.r-project.org/package=osmdata&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;osmdata&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;https://cran.r-project.org/package=sf&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sf&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://cran.r-project.org/package=ggmap&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ggmap&lt;/code&gt;&lt;/a&gt; packages and looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/ktm_bg.png&quot; alt=&quot;KTM map&quot; /&gt;&lt;/p&gt;

&lt;p&gt;First step, load the required libraries in R.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# load packages
library(osmdata)
library(tidyverse)
library(sf)
library(ggmap)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, get the base map outline and necessary map data from OSM. With the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;osmdata&lt;/code&gt; package, you are using ‘overpass API’ to extract OSM data. To get the map outline, you’ll have to define ‘bounding box’ - basically four coordinates/corners of your outline. If you just want to automatically create a ‘bounding box’ for a certain location, you can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getbb()&lt;/code&gt; function - for example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getbb(&quot;Kathmandu, Nepal&quot;)&lt;/code&gt;. However, I wanted to manually create a bounding box, for which I simply went to &lt;a href=&quot;https://www.openstreetmap.org&quot;&gt;OSM webpage&lt;/a&gt; and used ‘Export’ feature to manually select the area I need to create bounding box coordinates that I used below. Once you have your bounding box, you then use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;opq()&lt;/code&gt; function to build the query. Finally, you use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;add_osm_feature()&lt;/code&gt; function in the query to add the feature you require in your map (in this case “highway”, which includes all the roads). Once you have the final query defined, you can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;osmdata_sf()&lt;/code&gt; function to send the query defined earlier to the overpass server to return the data as a ‘simple feature (sf)’ format, which you’ll later plot. For the base map, you can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;get_map()&lt;/code&gt; function from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ggmap&lt;/code&gt; to pull the base map you want from among the options available.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# set bounding box
bb &amp;lt;- c(85.25, 27.64, 85.46, 27.75) #these coordinates bound Kathmandu and surrounding areas

# build query
q &amp;lt;- opq(bbox = bb) %&amp;gt;% 
  add_osm_feature(&quot;highway&quot;)

# get data in sf format
roads &amp;lt;- osmdata_sf(q)

# get base map, I'm using 'toner-background'
basemap &amp;lt;- get_map(bb, maptype = &quot;toner-background&quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You now have all the necessary things to produce the map. If you are familiar with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ggplot&lt;/code&gt; functions, then the following steps should look familiar. First step below plots the base map, then adds the roads from the sf data (note you’ll have to specify ‘osm_lines’ from the dataframe). Finally, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;theme_void()&lt;/code&gt; option removes all the axes etc, to get a clean cartographic plot.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ggmap(basemap) + 
    geom_sf(data = roads$osm_lines,
            inherit.aes = FALSE) + 
    theme_void()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Output from above looks like this, which is my main monitor background shown earlier!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/output.png&quot; alt=&quot;Final output&quot; /&gt;&lt;/p&gt;
</description>
        <pubDate>Mon, 28 Dec 2020 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/allelse/2020/12/28/Cartographic-desktop-backgrounds/</link>
        <guid isPermaLink="true">https://poudyal.me/allelse/2020/12/28/Cartographic-desktop-backgrounds/</guid>
        
        <category>dataviz</category>
        
        <category>maps</category>
        
        
        <category>allelse</category>
        
      </item>
    
      <item>
        <title>Visualising NVivo coding with plotly treemap</title>
        <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; &lt;em&gt;This post is only interesting/useful if you work with qualitative data and want to customise the “treemap” you get in &lt;a href=&quot;https://www.qsrinternational.com/nvivo-qualitative-data-analysis-software/home&quot;&gt;NVivo&lt;/a&gt;, one of the most commonly-used computer-assisted qualitative data analysis software (CAQDAS). Basically, you can make much better treemap plots using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plotly&lt;/code&gt; package in R using the coding frequency data that you can export from NVivo.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I’ve been coding qualitative data in NVivo for my research for the last few weeks, and one of the things I like doing as soon as I have done decent amount of coding is to visualise them in some way. While latest versions of NVivo do come with quite a few options for visualisation, “treemap”, which you can get through &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Hierarchy Chart&lt;/code&gt; option in NVivo is my favourite. The problem is I can’t do much with what NVivo provides in the way of these charts except to change colours, that too within the limited options available. So, I decided to export coding data that NVivo uses to produce these charts and use &lt;a href=&quot;https://plotly.com/r/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plotly&lt;/code&gt;&lt;/a&gt; package in R to create customisable treemap plots. Once you are in R, you just need the packages &lt;a href=&quot;https://www.tidyverse.org/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tidyverse&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;https://plotly.com/r/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plotly&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://cran.r-project.org/web/packages/RColorBrewer/index.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RColorBrewer&lt;/code&gt;&lt;/a&gt; for the codes below to run successfully.&lt;/p&gt;

&lt;h4 id=&quot;i-exporting-coding-data-from-nvivo&quot;&gt;I. Exporting coding data from NVivo&lt;/h4&gt;

&lt;p&gt;You basically have two options: if you use Windows version of NVivo then you can export data as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.xlsx&lt;/code&gt; file (i.e. Microsoft Excel format); if you use Mac version of NVivo then you can export data as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.csv&lt;/code&gt; to read into R later. Below two screenshots of Mac OS version of NVivo showing the treemap and underlying data that could be exported.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/nvivo01.png&quot; alt=&quot;Hierarchy chart in NVivo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is the default treemap you get in NVivo.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/nvivo02.png&quot; alt=&quot;Data export from NVivo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Export List...&lt;/code&gt; menu item to export the data from NVivo.&lt;/p&gt;

&lt;h4 id=&quot;ii-importing-data-into-r-and-structuring-the-df-for-plotly-treemap-plot&quot;&gt;II. Importing data into R and structuring the df for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plotly&lt;/code&gt; treemap plot&lt;/h4&gt;

&lt;p&gt;This is the only tricky bit in this workflow as the data from NVivo needs some processing in R to the structure needed for a &lt;a href=&quot;https://plotly.com/r/treemaps/&quot;&gt;treemap plot using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plotly&lt;/code&gt;&lt;/a&gt; package. I provide the replicable steps below with codes on data from NVivo’s built in example project.&lt;/p&gt;

&lt;p&gt;First, read data into a new dataframe, clean it a bit, remove unnecessary columns, unnecessary strings from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Codes&lt;/code&gt; column, and split hierarchical nodes (coding terms) into separate columns.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# load necessary libraries
library(tidyverse)
library(plotly)
library(RColorBrewer)

# read data
# this excludes autocoded nodes (can be selected when exporting data from NVivo)
df &amp;lt;- read.csv(&quot;https://raw.githubusercontent.com/mpoudyal/test-data/main/data/nvivo/ex_proj_codes.csv&quot;) 
glimpse(df) #check what you've just imported
names(df)[2:3] &amp;lt;- c(&quot;cref&quot;, &quot;agg_cref&quot;) # simple naming for code frequency columns
df &amp;lt;- df[-c(4,5)] # remove unnecessary columns

# remove &quot;Codes\\&quot; string from the `Codes` column
df$Codes &amp;lt;- gsub(&quot;Codes\\\\&quot;, &quot;&quot;, df$Codes, fixed=TRUE)

# prepare data for plotly treemap
# separate nodes (coding terms) into different columns, this is needed as NVivo exports hierarchical coding as single string with `\` separator
df &amp;lt;- df %&amp;gt;%
    separate(.,
             col = Codes,
             into = c(&quot;l1node&quot;, &quot;l2node&quot;,&quot;l3node&quot;,&quot;l4node&quot;),
             sep = &quot;\\\\&quot;,
             remove = FALSE,
             extra = &quot;merge&quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Create &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ids&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;labels&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;parents&lt;/code&gt; columns for treemap plot. This step creates the three columns of codes preserving hierarchy in the structure required for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plotly&lt;/code&gt; treemap.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;df &amp;lt;- df %&amp;gt;%
    mutate(ids = case_when(
        !is.na(l4node) ~ paste0(l3node,&quot;-&quot;,l4node),
        (is.na(l4node) &amp;amp; !is.na(l3node)) ~ paste0(l2node,&quot;-&quot;,l3node),
        (is.na(l3node) &amp;amp; !is.na(l2node)) ~ paste0(l1node,&quot;-&quot;,l2node),
        TRUE ~ l1node
    )) %&amp;gt;%
    mutate(labels = case_when(
        !is.na(l4node) ~ l4node,
        (is.na(l4node) &amp;amp; !is.na(l3node)) ~ l3node,
        (is.na(l3node) &amp;amp; !is.na(l2node)) ~ l2node,
        TRUE ~ l1node
    )) %&amp;gt;%
    mutate(parents = case_when(
        labels == l1node ~ &quot;&quot;,
        labels == l2node ~ l1node,
        labels == l3node ~ paste0(l1node,&quot;-&quot;,l2node),
        labels == l4node ~ paste0(l2node,&quot;-&quot;,l3node)
    ))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The data is now ready to be plotted.&lt;/p&gt;

&lt;h4 id=&quot;iii-plot-the-treemaps&quot;&gt;III. Plot the treemaps&lt;/h4&gt;

&lt;p&gt;First, treemap of all the coding.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# basic treemap
fig &amp;lt;- plot_ly(
    type = &quot;treemap&quot;,
    ids = df$ids,
    labels = df$labels,
    parents = df$parents,
    values = df$cref,
    textinfo = &quot;label+value&quot;)

# customise the plot with title and annotations
fig &amp;lt;- fig %&amp;gt;% 
    layout(title = list(text = &quot;Treemap of all coding*&quot;,
                        xref = &quot;paper&quot;, yref = &quot;paper&quot;),
               annotations = list(x = 1, y = -0.05,
                                  text = &quot;*Numbers indicate frequency of occurence for the code&quot;,
                                  showarrow = F, xref = &quot;paper&quot;, yref = &quot;paper&quot;,
                                  font = list(size = 12, color = &quot;charcoal&quot;)))
fig
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Output from above looks like this:&lt;/p&gt;

&lt;iframe width=&quot;900&quot; height=&quot;800&quot; frameborder=&quot;0&quot; scrolling=&quot;no&quot; src=&quot;//plotly.com/~mpoudyal/1.embed&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;While in the interactive &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plotly&lt;/code&gt; chart above we can zoom on to the coding groups and subgroups, it is often useful to create a new treemap only for the coding group(s) of interest. Below I create two further treemaps simply by subsetting the original data and using the same basic code as above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treemap for the coding group ‘Economy’&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;## subset data
df1 &amp;lt;- df[grepl(&quot;Economy&quot;, df[[&quot;Codes&quot;]]),]

fig1 &amp;lt;- plot_ly(
    type = &quot;treemap&quot;,
    ids = df1$ids,
    labels = df1$labels,
    parents = df1$parents,
    values = df1$cref,
    textinfo = &quot;label+value&quot;,
    marker = list(colors = brewer.pal(12,&quot;Set3&quot;))) # using RColorBrewer package for custom colour

fig1 &amp;lt;- fig1 %&amp;gt;% 
    layout(title = list(text = &quot;Treemap of codes for 'Economy'*&quot;,
                        xref = &quot;paper&quot;, yref = &quot;paper&quot; ),
               annotations = list(x = 1, y = -0.05,
                                  text = &quot;*Numbers indicate frequency of occurence for the code&quot;,
                                  showarrow = F, xref = &quot;paper&quot;, yref = &quot;paper&quot;,
                                  font = list(size = 12, color = &quot;charcoal&quot;)))
fig1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Output from the code above looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/nvivo_plot02.png&quot; alt=&quot;Treemap for 'Economy' coding group&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treemap for the coding group ‘Natural Environment’&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;## subset data
df2 &amp;lt;- df[grepl(&quot;Natural&quot;, df[[&quot;Codes&quot;]]),]

fig2 &amp;lt;- plot_ly(
    type = &quot;treemap&quot;,
    ids = df2$ids,
    labels = df2$labels,
    parents = df2$parents,
    values = df2$cref,
    textinfo = &quot;label+value&quot;,
    marker = list(colors = brewer.pal(8,&quot;Accent&quot;))) # using RColorBrewer package for custom colour

fig2 &amp;lt;- fig2 %&amp;gt;% 
    layout(title = list(text = &quot;Treemap of codes for 'Natural Environment'*&quot;,
                        xref = &quot;paper&quot;, yref = &quot;paper&quot; ),
               annotations = list(x = 1, y = -0.05,
                                  text = &quot;*Numbers indicate frequency of occurence for the code&quot;,
                                  showarrow = F, xref = &quot;paper&quot;, yref = &quot;paper&quot;,
                                  font = list(size = 12, color = &quot;charcoal&quot;)))
fig2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Output for the above code:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/nvivo_plot03.png&quot; alt=&quot;Treemap for 'Natural Environment' coding group&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As you can see above, with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plotly&lt;/code&gt; in R, there is much we can do to customise the treemaps and produce publication-quality figures compared to basic output you get from NVivo. I hope this workflow will come in handy for those of you who, like me, want to produce figures in R but have to rely on NVivo for much of the qualitative data analysis.&lt;/p&gt;
</description>
        <pubDate>Sat, 12 Dec 2020 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/research/2020/12/12/NVivo-treemaps-with-plotly-in-R/</link>
        <guid isPermaLink="true">https://poudyal.me/research/2020/12/12/NVivo-treemaps-with-plotly-in-R/</guid>
        
        <category>nvivo</category>
        
        <category>rstats</category>
        
        <category>dataviz</category>
        
        
        <category>research</category>
        
      </item>
    
      <item>
        <title>Downloading all linked PDFs from multiple URLs using Python</title>
        <description>&lt;p&gt;I’ve been learning &lt;a href=&quot;https://www.python.org&quot;&gt;Python&lt;/a&gt; in my spare time for the past couple of months — initially for data analysis and visualisation. I am still relatively new to the language but have been using it to automate tasks as needed (outside of the data stuffs), primarily by adapting/modifying codes others have shared in &lt;a href=&quot;https://stackoverflow.com&quot;&gt;Stack Overflow&lt;/a&gt;. Recently, I had to download loads of PDF reports related to SDGs, submitted by a select group of countries, from &lt;a href=&quot;https://sustainabledevelopment.un.org/memberstates/&quot;&gt;UN’s SDG portal&lt;/a&gt;. So, I decided to use Python to automate the task. Below is a simple web-scrapping code I wrote for the purpose, based on &lt;a href=&quot;https://stackoverflow.com/questions/54616638/download-all-pdf-files-from-a-website-using-python&quot;&gt;this&lt;/a&gt; from Stack Overflow.&lt;/p&gt;

&lt;p&gt;My key aim was to download all PDFs linked in a member country page and organise them in folders for each country. Also, where there were any errors in the links, I wanted the code to ignore those and continue (but also print an error message). Each component of the code with comments/explanations below.&lt;/p&gt;

&lt;p&gt;Loading necessary libraries/packages.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;import os
import requests
from bs4 import BeautifulSoup as bs
from urllib.parse import urljoin
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Define the base URL and create a list of country-level URLs, which are the ones we want to scrape for PDFs later. You can get the correct country names for  country-specific pages &lt;a href=&quot;https://sustainabledevelopment.un.org/memberstates/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;baseurl = 'https://sustainabledevelopment.un.org/memberstates/'
# these are the countries I wanted, these can be added removed as necessary
countries = ['bangladesh', 'china', 'colombia', 'india', 'kenya', 'madagascar', 'malawi', 'mozambique', 'peru', 'tanzania']
# build list of country-level urls
def build_url(country):
    return baseurl + country
urls = []
for country in countries:
    urls.append(build_url(country))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Using a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;for&lt;/code&gt; loop, go through each member country pages and download linked PDFs in respective folders. To ignore any errors in links to PDFs and continue the scraping, I use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;try&lt;/code&gt; when actually downloading the files from their linked URLs.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;for url in urls:
    # define folder name from member country portion of the url
    foldername = url.split('/')[-1]
    # create a folder for the country if it doesn't exist
    if not os.path.exists(foldername):os.mkdir(foldername)
    page = requests.get(url).text
    soup = bs(page)
    for link in soup.select(&quot;a[href$='.pdf']&quot;):
        filename = os.path.join(foldername, link['href'].split('/')[-1])
        with open(filename, 'wb') as f:
            try:
                f.write(requests.get(urljoin(url, link['href'])).content)
            except:
                print('Could not open url: ', urljoin(url, link['href']))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I was able to download 400+ documents in a few minutes, which manually would have perhaps taken me hours! Full code below:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;import os
import requests
from bs4 import BeautifulSoup as bs
from urllib.parse import urljoin

baseurl = 'https://sustainabledevelopment.un.org/memberstates/'
# these are the countries I wanted, these can be added removed as necessary
countries = ['bangladesh', 'china', 'colombia', 'india', 'kenya', 'madagascar', 'malawi', 'mozambique', 'peru', 'tanzania']
# build list of country-level urls
def build_url(country):
    return baseurl + country
urls = []
for country in countries:
    urls.append(build_url(country))

for url in urls:
    # define folder name from member country portion of the url
    foldername = url.split('/')[-1]
    # create a folder for the country if it doesn't exist
    if not os.path.exists(foldername):os.mkdir(foldername)
    page = requests.get(url).text
    soup = bs(page)
    for link in soup.select(&quot;a[href$='.pdf']&quot;):
        filename = os.path.join(foldername, link['href'].split('/')[-1])
        with open(filename, 'wb') as f:
            try:
                f.write(requests.get(urljoin(url, link['href'])).content)
            except:
                print('Could not open url: ', urljoin(url, link['href']))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
</description>
        <pubDate>Sat, 14 Nov 2020 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/allelse/2020/11/14/Web-scraping-using-Python/</link>
        <guid isPermaLink="true">https://poudyal.me/allelse/2020/11/14/Web-scraping-using-Python/</guid>
        
        <category>python</category>
        
        <category>coding</category>
        
        <category>web-scraping</category>
        
        
        <category>allelse</category>
        
      </item>
    
      <item>
        <title>Mapping countries' GDP against Fortune Global 500 top 10</title>
        <description>&lt;p&gt;I’ve been experimenting with/learning R and various packages whenever I get the time from work - to keep myself well-versed in some of the skills that I already possess, as well as to learn new tricks and keep myself updated on the new developments. As I’ve been curious about using R for mapping (GIS) for a while, this weekend I thought I should learn something new. So, I set about producing a series of maps highlighting the countries of the world with GDPs lower than the annual revenues of some of the largest companies in the world - &lt;a href=&quot;https://en.wikipedia.org/wiki/Fortune_Global_500#Fortune_Global_500_list_of_2019&quot;&gt;top 10 fortune global 500 companies&lt;/a&gt; to be exact.&lt;/p&gt;

&lt;p&gt;Another motivation came from reading the &lt;a href=&quot;https://chrgj.org/wp-content/uploads/2020/07/Alston-Poverty-Report-FINAL.pdf&quot;&gt;UN Special Rapporteur Philip Alston’s report on extreme poverty and human rights&lt;/a&gt; earlier this week. Two key points stood out for me, quoted below.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Poverty is a political choice”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;“By single-mindedly focusing on the World Bank’s flawed international poverty line, the international community mistakenly gauges progress in eliminating poverty by reference to a standard of miserable subsistence rather than an even minimally adequate standard of living. This in turn facilitates greatly exaggerated claims about the impending eradication of extreme poverty and downplays the parlous state of impoverishment in which billions of people still subsist.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The report makes a very grim reading, especially given the current and potential future impacts of the ongoing COVID19 pandemic - not to mention the kinds of leadership we have in most of the major economies at this moment. While much of what Prof Alston says in his report isn’t new, particularly the criticism of the World Bank’s measures of global poverty, and of other large global bodies, including the UN agencies’ failures to tackle global poverty over the years, it is still good to see report at the highest level highlighting these issues.&lt;/p&gt;

&lt;p&gt;Coming back to the maps I created, these maps not only show how large some of these companies are purely in terms of their economic power, I think they shed light on global inequalities too.&lt;/p&gt;

&lt;p&gt;Here is the first one highlighting all the countries in the world that had lower GDP in 2019 than Walmart’s revenue for the year. The American retail giant is number one in the global list of companies with 2019 revenue of US$514 billion.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/Walmart.png&quot; alt=&quot;Walmart's revenue vs countries' GDP&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The fifth-placed State Grid of China (US$387 billion revenue) is next, map below highlighting all the countries with GDPs lower than this company’s revenue in 2019.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/StateGridChina.png&quot; alt=&quot;State Grid's revenue vs countries' GDP&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Finally, the map highlighting all the countries with annual GDP lower than the 2019 revenue of Toyota Motor (US$273 billion), the company ranked 10th in the global list.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/ToyotaMotor.png&quot; alt=&quot;Toyota Motor's revenue vs countries' GDP&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As we can see, Walmart’s revenue in 2019 was larger than GDP of every single country in Africa and most of South America except Brazil. Fifth-placed State Grid of China also had revenue larger than GDP of all African countries except Nigeria, and even 10th-placed Toyota Motor had revenue larger than GDP of all African countries except Nigeria and South Africa.&lt;/p&gt;

&lt;p&gt;I used publicly available datasets and global maps from R package &lt;a href=&quot;https://cran.r-project.org/web/packages/rnaturalearth/README.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rnaturalearth&lt;/code&gt;&lt;/a&gt; to produce these maps using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ggplot2&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sf&lt;/code&gt; packages in R version 4.0.2. You can see/download data and steps, including the codes used to generate the maps from &lt;a href=&quot;https://github.com/mpoudyal/fg500vsgdp&quot;&gt;my GitHub repository&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Sat, 11 Jul 2020 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/allelse/2020/07/11/Mapping-countries-gdp-against-fortune-global-500-top-10/</link>
        <guid isPermaLink="true">https://poudyal.me/allelse/2020/07/11/Mapping-countries-gdp-against-fortune-global-500-top-10/</guid>
        
        <category>rstats</category>
        
        <category>dataviz</category>
        
        <category>maps</category>
        
        
        <category>allelse</category>
        
      </item>
    
      <item>
        <title>Mass conversion of SPSS files to CSV format in R</title>
        <description>&lt;p&gt;&lt;strong&gt;TLDR&lt;/strong&gt;: &lt;em&gt;In this rather long post, I provide a few options for mass conversion of SPSS data files to CSV, including steps to test out the functions on a simulated SPSS dataset. Assumes you already know the basics of working with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;R environment&lt;/code&gt;, including installing the packages where necessary.&lt;/em&gt;&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;I’ve had to work with a large number of SPSS data files in my job lately, not an ideal scenario as I primarily use R for data processing/analysis. However, if you ever use secondary data, specially in social science disciplines, you are likely to come across survey data recorded in SPSS more often than not. SPSS has certainly been a mainstay of social science research, particularly those involving surveys, for as long as I can remember - I learned to use SPSS for the first time as an undergrad and that was over 2 decades ago (giving away my age here!). And it seems the software is still going strong. Digressions aside, I needed a way to easily convert all the SPSS files I had to open data formats like the CSV for better archiving and sharing.&lt;/p&gt;

&lt;p&gt;As usual, I started by searching stackoverflow for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mass conversion of SPSS to CSV in R&lt;/code&gt;, and found answers like &lt;a href=&quot;https://stackoverflow.com/questions/53207055/convert-multiple-sav-to-csv-in-r&quot;&gt;this&lt;/a&gt; and a bit better version &lt;a href=&quot;https://stackoverflow.com/questions/48693759/how-to-batch-process-converting-all-sav-to-flat-file-that-are-in-a-folder-in-r&quot;&gt;here&lt;/a&gt;. Both were useful in giving me ideas on what I wanted to do, but neither worked for me as they are. So, I decided to write my own function(s) to mass convert SPSS files to CSV in R. Below I outline three functions and highlight their pros and cons.&lt;/p&gt;

&lt;h4 id=&quot;function-1-using-convert-function-from-the-rio-package&quot;&gt;Function 1: Using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;convert()&lt;/code&gt; function from the &lt;a href=&quot;https://cran.r-project.org/web/packages/rio/index.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rio&lt;/code&gt;&lt;/a&gt; package&lt;/h4&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;RIO_SPSS2CSV &amp;lt;- function(filepath) {
  setwd(filepath) #this is the root dir where SPSS data files/folders are located; .csv files will be stored in the same dir
  library(rio)
  files &amp;lt;- list.files(path = filepath, pattern = '.sav', recursive = TRUE) #recursive option to check all folders inside the root dir
  for (f in files) {
    convert(f, paste0(strsplit(f, split = '.', fixed = TRUE)[[1]][1],'.csv'))
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is the easiest and most straightforward of the three options I outline here. Basically the function above recursively looks for SPSS files within the specified &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;filepath&lt;/code&gt;, and the uses the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;convert()&lt;/code&gt; function in &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rio&lt;/code&gt;&lt;/strong&gt; package to convert them to CSV files in the same location. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;convert()&lt;/code&gt; basically wraps &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;import()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;export()&lt;/code&gt; functions thereby making the conversion simpler, however, not faster as we see below. It is also worth noting that this method writes values for the categorical variables rather than value labels (e.g., for variable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sex&lt;/code&gt; in original SPSS data with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1=Female&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2=Male&lt;/code&gt;, CSV would have 1 or 2 under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sex&lt;/code&gt; and not &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Female&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Male&lt;/code&gt;), which means you’d need an extra variable definition file for categorical variable to fully understand converted CSV files.&lt;/p&gt;

&lt;h4 id=&quot;function-2-using-characterize-function-together-with-import-and-export-from-the-rio-package&quot;&gt;Function 2: Using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;characterize()&lt;/code&gt; function together with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;import()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;export()&lt;/code&gt; from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rio&lt;/code&gt; package&lt;/h4&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;RIO_SPSS2CSV_VL &amp;lt;- function(filepath) {
  setwd(filepath)
  library(rio)
  files &amp;lt;- list.files(path = filepath, pattern = '.sav', recursive = TRUE)
  for (f in files) {
    export(characterize(import(f)), paste0(strsplit(f,split = '.', fixed = TRUE)[[1]][1],'.csv'))
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is just a slight (but very useful) tweak in the previous option. It is still using &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rio&lt;/code&gt;&lt;/strong&gt; package for the conversion, but instead of using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;convert()&lt;/code&gt; function, it now uses generic &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;import()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;export()&lt;/code&gt; functions with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;characterize()&lt;/code&gt; option to convert variables with defined value labels (i.e., categorical variables) to character or factor (e.g., for variable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sex&lt;/code&gt; in original SPSS data with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1=Female&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2=Male&lt;/code&gt;, CSV would now have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Female&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Male&lt;/code&gt; under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sex&lt;/code&gt; and not 1 or 2). This is particularly useful as you would  not need a separate document defining value labels for categorical variables.&lt;/p&gt;

&lt;h4 id=&quot;function-3-using-foreign-package-with-writecsv-function&quot;&gt;Function 3: Using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;foreign&lt;/code&gt; package with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;write.csv&lt;/code&gt; function&lt;/h4&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;FOR_SPSS2CSV &amp;lt;- function(filepath) {
  setwd(filepath)
  files &amp;lt;- list.files(path = filepath, pattern = '.sav', recursive = TRUE)
  for (f in files) {
    write.csv(
      x = foreign::read.spss(file = f, to.data.frame = TRUE, use.value.labels = TRUE, use.missings = TRUE, reencode = FALSE),
      file = sprintf(&quot;%s.csv&quot;, tools::file_path_sans_ext(f)),
      row.names = FALSE, na = &quot;&quot;
      )
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This final option uses the &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;foreign&lt;/code&gt;&lt;/strong&gt; package, one of the default packages that comes with every R installation, so without the need to install any extra package for this task. Few good points about using &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;foreign&lt;/code&gt;&lt;/strong&gt; package — first, you can easily switch to copying values or value labels using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;use.value.labels&lt;/code&gt; option (see function above); second, this gives you the option to define missing values in converted file using user defined missing values in SPSS by setting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;use.missings&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUE&lt;/code&gt;; and finally, &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;foreign&lt;/code&gt;&lt;/strong&gt; provides warnings for unexpected values in original SPSS files, for example, when certain value in a categorical variable is undefined - allowing users to take actions against unexpected cases.&lt;/p&gt;

&lt;h4 id=&quot;testing-the-functions-with-simulated-spss-data&quot;&gt;Testing the functions with simulated SPSS data&lt;/h4&gt;
&lt;p&gt;This section is really only possible because I found &lt;a href=&quot;https://martinctc.github.io/blog/vignette-simulating-a-minimal-spss-dataset-from-r/&quot;&gt;this excellent post on simulating SPSS data&lt;/a&gt; by &lt;a href=&quot;https://martinctc.github.io&quot;&gt;Martin Chan&lt;/a&gt;. Example below is more or less literal copy from his post linked above - I’ve only tweaked the variable and data type to make them more relatable to the type of data with which I normally work.
Start by loading necessary packages - &lt;a href=&quot;https://www.tidyverse.org/&quot;&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tidyverse&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt;, &lt;a href=&quot;https://github.com/martinctc/surveytoolbox&quot;&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;surveytoolbox&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; and &lt;a href=&quot;https://haven.tidyverse.org/&quot;&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;haven&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt;, and creating a directory to save simulated SPSS data file.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;library(tidyverse)
library(surveytoolbox) #if you don't have this, you'll need to install it from the source with devtools::install_github(&quot;martinctc/surveytoolbox&quot;) 
library(haven)
#create a directory to save simulated SPSS data. this will also be the base directory/filepath to test conversion functions above
dir.create(&quot;sav&quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I want to simulate a more-or-less typical rural household survey data where majority of the household heads are male. So, I’m going to create a dataset with 200 observations with high male respondents in the sample - the dataset will have the variables &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sex&lt;/code&gt; (sex of HH head), &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;education&lt;/code&gt; (highest education attainment of the HH head), and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;place_attach&lt;/code&gt; (place attachment). In addition, I’ll make &lt;strong&gt;education&lt;/strong&gt; variable dependent on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sex&lt;/code&gt; (with &lt;em&gt;higher&lt;/em&gt; educational attainment skewed &lt;em&gt;towards&lt;/em&gt; male HH heads); and &lt;strong&gt;place attachment&lt;/strong&gt; variable dependent on &lt;strong&gt;highest educational attainment&lt;/strong&gt; (those with &lt;em&gt;higher&lt;/em&gt; education more likely to have &lt;em&gt;lower&lt;/em&gt; place attachment).&lt;/p&gt;

&lt;p&gt;Lets begin by creating &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;id&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sex&lt;/code&gt; variables.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;set.seed(97) #this is to ensure reproducibility of this example but not necessary if simply testing SPSS to CSV conversion functions with your own data.

#id variable
v_id &amp;lt;- seq(1, 200) %&amp;gt;% set_varl(&quot;Household Identifier&quot;)

#sex variable
v_sex &amp;lt;- sample(x = 1:2,
                size = 200, replace = TRUE,
                prob = c(.25 , .75)) %&amp;gt;%  #skewed probability to reflect more male HH heads
  set_vall(value_labels = c(&quot;Female&quot; = 1,
                            &quot;Male&quot; = 2)) %&amp;gt;% 
  set_varl(&quot;HH Head's Sex&quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then create &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;education&lt;/code&gt; variable that depends on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sex&lt;/code&gt; variable above.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#Highest education attainment variable - sex-dependent sampling
v_edu &amp;lt;-
  v_sex %&amp;gt;%
  map_dbl(function(x){
    if(x == 1){
      sample(0:6,
             size = 1,
             prob = c(25, 15, 20, 20, 15, 5, 5)) #Sum to 100
    } else {
      sample(0:6,
             size = 1,
             prob = c(10, 10, 20, 15, 25, 10, 10)) #Sum to 100
    }
  }) %&amp;gt;%
  set_vall(value_labels = c(&quot;Illiterate&quot; = 0,
                            &quot;Literate - no formal education&quot; = 1,
                            &quot;Primary school&quot; = 2,
                            &quot;Lower secondary school&quot; = 3,
                            &quot;Secondary school&quot; = 4,
                            &quot;College/Technical college&quot; = 5,
                            &quot;University degree&quot; = 6)) %&amp;gt;%
  set_varl(&quot;Highest education level&quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally create variable for &lt;strong&gt;place attachment&lt;/strong&gt; which depends on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;education&lt;/code&gt; variable above.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#Place attachment variable - education-dependent sampling
v_place &amp;lt;- 
  v_edu %&amp;gt;% 
  map_dbl(function(x){
    if(x&amp;gt;=4){
      sample(1:5,
             size = 1,
             prob = c(25, 25, 20, 20, 10)) #Sum to 100
    } else {
      sample(1:5,
             size = 1,
             prob = c(5, 10, 20, 30, 35)) #Sum to 100
    }
  }) %&amp;gt;% 
  set_vall(value_labels = c(&quot;Not attached at all&quot; = 1,
                            &quot;Not very attached&quot; = 2,
                            &quot;Neutral&quot; = 3,
                            &quot;Attached&quot; = 4,
                            &quot;Very attached&quot; = 5)) %&amp;gt;% 
  set_varl(&quot;Place attachment&quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can now combine individual vectors and save the dataset&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;combined_df &amp;lt;-
  tibble(id = v_id,
         sex = v_sex,
         education = v_edu,
         place_attach = v_place)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Save the combined data to the new directory created at the beginning. And also create a couple of more SPSS files by subsetting the main simulated data so we have more than one file to check the conversion functions.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#save simulated data in SPSS format
combined_df %&amp;gt;% haven::write_sav(&quot;sav/Simulated_Dataset.sav&quot;)
#create more SPSS files from the same dataframe to test file conversion functions
combined_df %&amp;gt;% filter(sex==1) %&amp;gt;% write_sav(&quot;sav/female_only.sav&quot;)
combined_df %&amp;gt;% filter(sex==2) %&amp;gt;% write_sav(&quot;sav/male_only.sav&quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Assuming the functions above are already loaded in your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;R environment&lt;/code&gt;, you simply load each function with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sav&lt;/code&gt; directory that you created at the beginning of simulated data creation in place of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;filepath&lt;/code&gt; as follows:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#using Function 1
RIO_SPSS2CSV(&quot;Drive://path/to/sav&quot;) #make sure you provide full file path as in the example, NOT relative path

#using Function 2
RIO_SPSS2CSV_VL(&quot;Drive://path/to/sav&quot;) #make sure you provide full file path as in the example, NOT relative path

#using Function 3
FOR_SPSS2CSV(&quot;Drive://path/to/sav&quot;) #make sure you provide full file path as in the example, NOT relative path
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On every run of the above function, you’ll see SPSS files within your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sav&lt;/code&gt; folder converted to CSV file with corresponding name, as shown in screenshot of my &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sav&lt;/code&gt; directory below:
&lt;img src=&quot;/assets/img/spss2csv_01.png&quot; alt=&quot;SPSS to CSV conversion&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;processing-time-and-choice-of-option&quot;&gt;Processing time and choice of option&lt;/h4&gt;
&lt;p&gt;I used &lt;a href=&quot;https://cran.r-project.org/web/packages/tictoc/&quot;&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tictoc&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; package to get processing times for each of the functions. For the simulated data above, my processing times were &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0.12s&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0.08s&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0.05s&lt;/code&gt; for &lt;strong&gt;Functions 1&lt;/strong&gt;, &lt;strong&gt;2&lt;/strong&gt; and &lt;strong&gt;3&lt;/strong&gt; respectively. These functions were tested using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;R Version 4.0.1&lt;/code&gt; in &lt;strong&gt;RStudio&lt;/strong&gt; environment. I used Intel i7-6700 (3.4Ghz) with 32GB RAM and a SSD drive running Windows 10 for these tests. I also tested the functions on actual SPSS dataset. I had data from a very large household survey spread over multiple folders and files, each file with 160 to over 1000 observations (rows) and with five to over 50 variables (columns) in each file. Altogether 202 SPSS files were processed in 4 folders with directory structure as follows:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;basedir
+--subfolder1
    +--subsubfolder1.1 (49 SPSS files, 3.47MB)
    +--subsubfolder1.2 (50 SPSS files, 4.36MB)
+--subfolder2
    +--subsubfolder2.1 (51 SPSS files, 5.72MB)
    +--subsubfolder2.2 (52 SPSS files, 5.46MB)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In terms of processing time (&lt;em&gt;averaged over three runs for each function&lt;/em&gt;), &lt;strong&gt;Function 1&lt;/strong&gt; took the longest, followed by &lt;strong&gt;Function 3&lt;/strong&gt; — &lt;strong&gt;Function 2&lt;/strong&gt; being the fastest (see table below for summary).&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Function&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
      &lt;th&gt;Processing time (seconds)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Function 1&lt;/td&gt;
      &lt;td&gt;Function using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;convert()&lt;/code&gt; function from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rio&lt;/code&gt; package.&lt;/td&gt;
      &lt;td&gt;71.48&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Function 2&lt;/td&gt;
      &lt;td&gt;Function using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;export()&lt;/code&gt; function in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rio&lt;/code&gt; package with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;characterize()&lt;/code&gt; option to write value labels for categorical variables.&lt;/td&gt;
      &lt;td&gt;52.10&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Function 3&lt;/td&gt;
      &lt;td&gt;Function using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;foreign&lt;/code&gt; package to read SPSS files and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;write.csv()&lt;/code&gt; function to write CSV files.&lt;/td&gt;
      &lt;td&gt;56.86&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;So, just looking at the processing time, obvious choice is to use &lt;strong&gt;Function 2&lt;/strong&gt;. However, if you do like the options that &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;foreign&lt;/code&gt;&lt;/strong&gt; package provides to generate different outputs to account for different types of variables in SPSS, then you might still consider using &lt;strong&gt;Function 3&lt;/strong&gt; above, as the latter could be important especially for data from the social surveys.&lt;/p&gt;

&lt;p&gt;To sum up, I think if you simply want to read SPSS files to work with them in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;R environment&lt;/code&gt;, using package like &lt;a href=&quot;https://cran.r-project.org/web/packages/haven/index.html&quot;&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;haven&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; or &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rio&lt;/code&gt;&lt;/strong&gt; which wraps &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;haven&lt;/code&gt;&lt;/strong&gt; among other packages within its functions provides you with better options to read and use metadata-rich formats like the SPSS. On the other hand, if you simply want to mass convert SPSS files to CSV files, you can pick &lt;strong&gt;Function 2&lt;/strong&gt; or &lt;strong&gt;Function 3&lt;/strong&gt;, depending on the kind of data you have in SPSS and the options you require in the conversion.&lt;/p&gt;

&lt;hr /&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Post updated on &lt;strong&gt;16 June 2020&lt;/strong&gt; to include the section on simulated SPSS data. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In order to keep this post at a manageable length, I’ve left out some of the checks and verification you can do on the simulated data in this post, which you can see in &lt;a href=&quot;https://martinctc.github.io/blog/vignette-simulating-a-minimal-spss-dataset-from-r/&quot;&gt;Martin’s post here&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Wed, 10 Jun 2020 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/research/2020/06/10/Mass-conversion-of-SPSS-to-CSV-in-R/</link>
        <guid isPermaLink="true">https://poudyal.me/research/2020/06/10/Mass-conversion-of-SPSS-to-CSV-in-R/</guid>
        
        <category>rstats</category>
        
        <category>datawrangling</category>
        
        <category>reference</category>
        
        
        <category>research</category>
        
      </item>
    
      <item>
        <title>My Writing Workflow</title>
        <description>&lt;p&gt;&lt;img src=&quot;/assets/img/writing-workflow.png&quot; alt=&quot;My writing workflow&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Following on from my previous post about &lt;a href=&quot;https://poudyal.me/research/2017/07/21/My-data-workflow/&quot;&gt;my data workflow&lt;/a&gt;, I outline my basic writing workflow here. As mentioned in the &lt;a href=&quot;https://poudyal.me/research/2017/07/21/My-data-workflow/&quot;&gt;previous post&lt;/a&gt;, I use &lt;a href=&quot;https://www.literatureandlatte.com/scapple.php&quot;&gt;Scapple&lt;/a&gt; as a tool to organise my thoughts, brainstorm, and plan my work, including basic outline for my write-up (background on the figure above).&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;I start my initial drafts, particularly methods and results sections within RStudio, as it is where I do my data analysis and visualisation and it is simply easier to write about methods and results while they are being worked on. However, for most of my original writing, I use &lt;a href=&quot;https://www.literatureandlatte.com/scrivener.php&quot;&gt;Scrivener&lt;/a&gt;, with &lt;a href=&quot;https://www.sonnysoftware.com/bookends/bookends.html&quot;&gt;Bookends&lt;/a&gt; as my reference manager. I can easily export Markdown written within RStudio in a format like RTF for Scrivener. Scrivener is one of the very best applications available for academic writing, possibly for any kind of writing, as it allows you to organise your writings in small segments, set targets (word count) and track the progress easily, as well as collate and organise research materials, such as relevant papers, snippets or any other kinds of materials. While I use my Mac for most of my writing, Scrivener is also available for Windows, I often work on Scrivener in Windows, especially in office where I have to use Windows PC. I normally use Dropbox to store my Scrivener projects so I can pick up from where I left off on any of my PCs, including on my iPad with Scrivener app. On the Mac Bookends works very well with Scrivener as a reference manager; however, if you are working on a Windows PC, you can easily use Endnote or Mendeley for reference management, and to insert citation into your write-up (as citation codes that can later be automatically scanned by reference manager like Endnote to create formatted bibliography).&lt;/p&gt;

&lt;p&gt;Once I have a full draft of the paper/report, I export them from Scrivener to a specialised word processing application. If I am working alone on the project - and do not need others to edit the text, I often work on &lt;a href=&quot;https://www.mellel.com/mellel/&quot;&gt;Mellel&lt;/a&gt; in my Mac. Mellel is one of the best and most stable word processing application on MacOS, especially when you are writing a long text document, such as a thesis; and it works perfectly with Bookends for reference management. For example, I finalised my PhD dissertation on Mellel with Bookends for reference management. When I needed my thesis chapters to be commented on, I sent them as RTF to my supervisors so they could comment on it using MS Word. But if you do not need others to directly edit the document, Mellel can export document as PDF directly from the main menu.&lt;/p&gt;

&lt;p&gt;When I am working on a co-authored paper, I move from Scrivener to MS Word when I have to have other authors working on the paper as well, as everybody I work with uses Word and are comfortable working on it (I don’t know any of my co-authors who use Mellel for example). I have on occasions used Google Docs when I’ve wanted inputs from more than one co-authors at the same time, and also to make the versioning easier, however, being online-only makes Google Docs hard to use, especially when your collaborators are travelling or are in places with poor internet connection. Hence MS Word is usually the go-to application when working on a co-authored paper. In terms of reference manager, there are a number of options that work well with Word. Bookends works with Word as well on a Mac, but as it is not available on Windows OS, I either use Endnote or Mendeley for organising references and citation when working on co-authored papers.&lt;/p&gt;

&lt;p&gt;Once the papers are finalised on Word (or Mellel), I convert them to PDF for submission to journals or for wider circulation if they are research reports or working papers.&lt;/p&gt;

&lt;hr /&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;A note on the use of proprietary/for-cost applications, and availability of free/open-source alternatives. While I do like to use free and open-source applications as much as is possible, they also have to have the necessary features that you are after. Sometimes, you just want an application that works out-of-the-box without having to do much tweaking. For these reasons, I do have quite a few for-cost applications in my workflow; however, some of these applications do have potential free/open-source alternatives.  While there are some free/open-source mind-mapping tools, I haven’t found one that is as easy to use and flexible as &lt;strong&gt;Scapple&lt;/strong&gt; and that works seamlessly in both Mac and Windows. &lt;strong&gt;RStudio&lt;/strong&gt; comes in free, open-source edition. For the main writing environment, again I don’t know of any free/open-source alternative to &lt;strong&gt;Scrivener&lt;/strong&gt; with similar set of features. &lt;strong&gt;Scapple&lt;/strong&gt; and &lt;strong&gt;Scrivener&lt;/strong&gt; both have slightly cheaper &lt;em&gt;Education licence&lt;/em&gt;. For reference management, &lt;a href=&quot;https://www.mendeley.com&quot;&gt;Mendeley&lt;/a&gt; is free but requires an online account (free), and there are other similar free alternatives like &lt;a href=&quot;https://endnote.com/product-details/basic&quot;&gt;Endnote basic&lt;/a&gt; or &lt;a href=&quot;https://www.zotero.org&quot;&gt;zotero&lt;/a&gt;. For final writing, free and open-source &lt;a href=&quot;https://www.libreoffice.org&quot;&gt;LibreOffice&lt;/a&gt; is more or less a complete replacement for MS Office suite, and its &lt;strong&gt;Writer&lt;/strong&gt; can be used instead of &lt;strong&gt;MS Word&lt;/strong&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sat, 22 Jul 2017 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/research/2017/07/22/My-writing-workflow/</link>
        <guid isPermaLink="true">https://poudyal.me/research/2017/07/22/My-writing-workflow/</guid>
        
        <category>writeup</category>
        
        <category>workflows</category>
        
        
        <category>research</category>
        
      </item>
    
      <item>
        <title>My Data Workflow</title>
        <description>&lt;p&gt;&lt;img src=&quot;/assets/img/data-workflow.png&quot; alt=&quot;My data workflow&quot; /&gt;&lt;/p&gt;

&lt;p&gt;First of all, I use &lt;a href=&quot;https://www.literatureandlatte.com/scapple.php&quot;&gt;Scapple&lt;/a&gt; as a tool to organise my thoughts, brainstorm, and plan my work (background on the figure above).&lt;/p&gt;

&lt;p&gt;Most of the data I work with comes from structured surveys. The original raw data is usually entered and cleaned in Excel - primarily because virtually everybody knows how to work with Excel. Once the data is cleaned and ready to be analysed, I export them to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.csv&lt;/code&gt; format. If the data is also going to be deposited in public data archives &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.csv&lt;/code&gt; is one of the most commonly accepted formats. I then import the data into R for analysis and visualisation. I use &lt;a href=&quot;https://rstudio.com/&quot;&gt;RStudio&lt;/a&gt; as the main work environment, for data organising and manipulation (using packages like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dplyr&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reshape2&lt;/code&gt;), for analysis and visualisation (packages like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gmnl&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ggplot2&lt;/code&gt; - also see my other post about &lt;a href=&quot;https://poudyal.me/research/2017/07/04/My-favourite-visualisation-packages-in-R/&quot;&gt;my favourite visualisation packages&lt;/a&gt;), and also for initial drafts of my reports/papers (using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rmarkdown&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;knitr&lt;/code&gt;).&lt;/p&gt;
</description>
        <pubDate>Fri, 21 Jul 2017 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/research/2017/07/21/My-data-workflow/</link>
        <guid isPermaLink="true">https://poudyal.me/research/2017/07/21/My-data-workflow/</guid>
        
        <category>data</category>
        
        <category>workflows</category>
        
        
        <category>research</category>
        
      </item>
    
      <item>
        <title>My favourite visualisation packages in R</title>
        <description>&lt;p&gt;Over the past two years I’ve used &lt;a href=&quot;http://r-project.org&quot;&gt;R&lt;/a&gt; within &lt;a href=&quot;http://rstudio.com&quot;&gt;RStudio&lt;/a&gt; environment as my only data analysis/visualisation application for my research. For the most part I’m a self-taught R/RStudio user, and I’m quite pleased with how far I’ve come in terms of being able to do pretty much everything I need in terms of data analysis and visualisation, and a significant part of writing up using RMarkdown in RStudio. In terms of data visualisation in R, I guess &lt;a href=&quot;http://ggplot2.org&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ggplot2&lt;/code&gt;&lt;/a&gt; is what everybody turns to first, and I’m no exception. I love &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ggplot&lt;/code&gt; and the flexibility it allows in terms of creating figures. However, there are some other packages which let you create some interesting plots either for exploratory analysis or from the regression outputs. I briefly discuss two such packages that have become my favourites over the last couple of years.&lt;/p&gt;

&lt;h4&gt;&lt;a href=&quot;http://www.cbs.dtu.dk/~eklund/beeswarm/&quot; target=&quot;_blank&quot;&gt;beeswarm&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;I’ve become a huge fan of &lt;strong&gt;beeswarm plots&lt;/strong&gt; ever since I discovered this package while looking for ways to plot individual overlapping points on a two dimensional plot. Not only this package allows us to plot individual data points that would otherwise overlap, it also allows to save the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;beeswarm&lt;/code&gt; plot data as datatable, which can then be plotted using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ggplot&lt;/code&gt; with additional dimensions as necessary. This is exactly what I did for the figure below that formed part of &lt;a href=&quot;http://www.sciencedirect.com/science/article/pii/S095937801630005X&quot;&gt;this journal paper&lt;/a&gt; published in 2016. In addition to the location of data points in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;beeswarm&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hex&lt;/code&gt; arrangement, we changed the colour as well as size of the data points based on additional information in two other variables. The resulting plot is a simple representation of the location of respondents’ dwellings from the park boundary, but also providing much richer information without making it too complicated or confusing to look at.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/gec_beeswarm_plot.jpg&quot; alt=&quot;Beeswarm plot&quot; /&gt;&lt;/p&gt;

&lt;h4&gt;&lt;a href=&quot;http://www.strengejacke.de/sjPlot/&quot; target=&quot;_blank&quot;&gt;sjPlot&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;I first came across &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sjPlot&lt;/code&gt; package while trying to find a way to create nicely formatted tables for regression outputs in R. However, over time I’ve used this package more to visualise results from different types of statistical analyses in R that I carry out for my socio-economic research, which I guess is not surprising given the package description, which I quote below:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;blockquote&gt;
    &lt;p&gt;“Results of various statistical analyses (that are commonly used in social sciences) can be visualized using this package, including simple and cross tabulated frequencies, histograms, box plots, (generalized) linear models, mixed effects models, PCA and correlation matrices, cluster analyses, scatter plots, Likert scales, effects plots of interaction terms in regression models, constructing index or score variables and much more.”&lt;/p&gt;
  &lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;Among several other types of plots, I used this package to create the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;odds-ratio plot&lt;/code&gt; shown below, which featured in &lt;a href=&quot;http://www.sciencedirect.com/science/article/pii/S095937801630005X&quot;&gt;our journal paper&lt;/a&gt; published in 2016.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/gec_or_plot.jpg&quot; alt=&quot;Odds-ratio plot&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I often use &lt;a href=&quot;https://cran.r-project.org/web/packages/sjmisc/index.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sjmisc&lt;/code&gt;&lt;/a&gt; package together with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sjPlot&lt;/code&gt;, especially to create nice variable labels to use in the plots or tables. Another reason why &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sjPlot&lt;/code&gt; is among my favourite packages is its active development and  &lt;a href=&quot;https://strengejacke.wordpress.com&quot;&gt;a very useful set of blog posts with examples&lt;/a&gt;, and prompt response to comments on these posts whenever I’ve had any queries regarding the package.&lt;/p&gt;
</description>
        <pubDate>Tue, 04 Jul 2017 00:00:00 +0000</pubDate>
        <link>https://poudyal.me/research/2017/07/04/My-favourite-visualisation-packages-in-R/</link>
        <guid isPermaLink="true">https://poudyal.me/research/2017/07/04/My-favourite-visualisation-packages-in-R/</guid>
        
        <category>rstats</category>
        
        <category>dataviz</category>
        
        
        <category>research</category>
        
      </item>
    
  </channel>
</rss>
