Item response theory (IRT) models estimate a latent underlying quantity for individual units (legislators, judges, students) based on a series of binary indicator variables for each unit (votes, decisions, test answers). The basic two parameter IRT model is given by:

\[ \begin{align} \text{Pr}(y_{ij} = 1) = \text{logit}^{-1}(\beta_j\theta_i - \alpha_j) \end{align} \]

Where \(\beta_j\) is the discrimination parameter for indicator \(j\), which determines how much a one or zero on this indicator tells us about subject (country, legislator, respondent, etc.) \(i\)’s latent construct \(\theta_i\) and \(\alpha_j\) is a baseline difficulty parameter for indicator \(j\) that \(\theta_i\) must overcome for us to observe a one.

Estimation in R with MCMCpack

We’re going to fit a one dimensional IRT model on the UN Voting Data from Bailey, Strezhnev, and Voeten (2017), using countries as the unit of analysis. Following their analysis, the latent construct will be alignment with the US led liberal world order. Our indicators are UN General Assembly votes, and we’re going to limit our analysis to 2015 so we don’t have to deal with changes in alignment over time.

library(MCMCpack) # load first becuase MASS::select() conflicts with dplyr::select()
library(tidyverse)
library(countrycode)

# read in UN voting data
UN <- read.delim('UNVotesPublished.tab', sep = '\t', quote = '')

# subset just to voting data
UN <- UN %>% filter(year == 2015) %>% select(ccode, vote)

# reshape data and assign country names
votes <- plyr::ddply(UN, 'ccode', function(x) t(x$vote)) %>% 
  mutate(ccode = countrycode(ccode, origin = 'cown',
                              destination = 'country.name')) %>%
  replace_na(list(ccode = 'Vietnam')) # Vietnam is the only failure to match

We’re going to treat abstentions as no votes, and absences as missing data (these data illustrate the limitations of canned code; a more accurate model could use a categorical logit instead of a binomial one because states often abstain for reasons separate from voting yea or nay).

# recode vote values and convert to matrix
data_mat <- votes %>%
  select(-ccode) %>% 
  apply(2, function(x) ifelse(ifelse(x %in% 2:3, 0, x) >= 8, NA,
                              ifelse(x %in% 2:3, 0, x)))

# add rownames to votes matrix for presentation
rownames(data_mat) <- votes$ccode

The last step is to run our model. In order to identify the model, we need to impose some sort of restriction on the data. We can hard-code the alignment values of certain countries, constrain two or more countries’ alignments to opposite signs, or specify starting values for the MCMC sampler which place two countries on different sides of zero. Let’s constrain the signs of two countries we know to be opposed since that is less restrictive the the first option and simpler than the third one.

# fit IRT model, fixing Cuba and US to be opposite signs
irt <- MCMCirt1d(datamatrix = data_mat,
                 theta.constraints = list('Cuba' = '-', 'United States' = '+'))

The ggmcmc package contains several useful functions for extracting data from mcmc objects and plotting results. The ggs() function collects information on MCMC samples into dataframes that can be used by functions like ggs_caterpillar() to produce summary plots.

library(ggmcmc)

# get estimates for plottig
irt_dat <- ggs(irt)

# coefficient plot
ggs_caterpillar(irt_dat, family = 'theta')

This plot is helpful, but not the clearest or most readable. Let’s take the samples contained in irt_dat and create our own dataframe for plotting with ggplot().

# take mean and 95% quantiles by country, trim names, and sort by mean
irt_gg <- plyr::ddply(irt_dat, 'Parameter',
                      function(x) c(mean(x$value),
                                   quantile(x$value, probs = c(.025, .975)))) %>%
  mutate(Parameter = gsub('theta.', '', Parameter)) %>%
  rename(mean = V1, lo = `2.5%`, hi = `98%`) %>%
  dplyr::arrange(mean) %>%
  mutate(Parameter = factor(Parameter, levels = as.character(Parameter)),
         panel = as.numeric(cut(1:n(), breaks = 3)))

# plot estimates
ggplot(irt_gg, aes(x = Parameter, y = mean, ymin = lo, ymax = hi)) +
  geom_linerange() +
  geom_point() +
  coord_flip() +
  facet_wrap(~ panel, scales = 'free') +
  theme_bw() +
  labs(x = '', y = '') +
  theme(axis.text.y = element_text(size = 5))

There are very high levels of uncertainty around the estimates of latent alignment for a handful of states. One of the most common causes of this exceptionally high uncertainty is when many indicators are not observed for a subject. If we take a look at the percentage of absences by country, we see that some of these countries have the highest rates of absenteeism. São Tomé & Príncipe has an uncertainty orders of magnitudes greater than its neighbors in the distribution of alignments and was absent for 99% of General Assemlby votes in 2015; just because we can get estimates for every country in our data doesn’t mean they’ll be good estimates.

data.frame(Absent = apply(data_mat, 1, function(x) mean(is.na(x)))) %>%
  rownames_to_column('Country') %>%
  arrange(desc(Absent)) %>%
  filter(Absent > 0)

Hierarchical IRT Models

We can also estimate hierarchical IRT models which allow the mean of each alignment estimate to be a function of country level predictors.

\[ \begin{align} \text{Pr}(y_{ij} = 1) &= \text{logit}^{-1}(\beta_j\theta_i - \alpha_j) \\ \theta_i &\sim \mathcal{N}(\mathbf{x}_i\boldsymbol{\gamma}, \sigma^2) \end{align} \]

where \(boldsymbol{\gamma}\) is a vector of regression coefficients that controls the influence of the hierarchical predictors on the mean of the latent alignment for each country. Annoyingly, MCMCpack refers to the hierarchical predictor coefficients as \(\boldsymbol{\beta}\), but I’ve used \(\boldsymbol{\gamma}\) above as the convention in the IRT literature is that \(\boldsymbol{\beta}\)s are discrimination parameters. This model is also known as the multiple indicators, multiple causes (MIMIC) model as we observe multiple indicators which multiple predictors affect through the latent quantity. Let’s extend our UN voting IRT model by adding in a measure of polyarchy from V-Dem (Coppedge et al. 2017) and trade as a percentage of GDP from the World Bank (World Bank 2018).

library(WDI)
library(vdem)

# download trade as a percent of GNI data
WB <- WDI(country = 'all',
          indicator = c('NE.TRD.GNFS.ZS'), start = '2015', end = '2015') %>%
  select(ccode = country, trade = NE.TRD.GNFS.ZS)

# read in polyarchy from VDEM
vdem <- extract_vdem(name_pattern = 'v2x_polyarchy',
                     include_sd = T, include_uncertainty = F) %>% 
  filter(year == 2015) %>%
  select(ccode = vdem_country_name, polyarchy = v2x_polyarchy)

# join hierarchical predictors and drop missing values
votes_hier <- votes %>% left_join(vdem) %>% left_join(WB) %>%
  filter(!is.na(trade), !is.na(polyarchy))

We’ve lost 0 rows from our data due to listwise deleting countries with missing hierarchical predictors. For actual research, you’d want to multiply impute these missing values, perform your IRT analysis on each imputed dataset, and then pool samples from all models to account for imputation uncertainty in your inference. For now, we’re just going to remove those countries from our data, so go ahead and create a new votes matrix from this subset of countries.

# recode vote values and convert to matrix
data_mat_hier <- votes_hier %>%
  select(-ccode, -trade, -polyarchy) %>% 
  apply(2, function(x) ifelse(ifelse(x %in% 2:3, 0, x) >= 8, NA,
                              ifelse(x %in% 2:3, 0, x)))

# add rownames to votes matrix for presentation
rownames(data_mat_hier) <- votes_hier$ccode

The last step we need to do is extract our hierarchical predictors into a separate dataframe. The data are on very different scales because polyarchy is a normalized variable from its own measurement model, and some countries have trade valued higher than their total GDP.

# plot densities of hierarchical predictors
votes_hier %>% select(polyarchy, trade) %>%
  reshape2::melt() %>%
  ggplot(aes(x = value)) +
  geom_density(fill = 'grey', color = NA) +
  facet_wrap(~ variable, scales = 'free') +
  labs(x = '', y = '') +
  theme_bw()

To speed up the sampler and get meaningful results, we need to standardize both variables using the scale() function.

# extract hierarchical predictors
xjdata <- votes_hier %>% select(trade:polyarchy) %>%
  mutate_all(function(x) as.numeric(scale(x)))

Once we’ve done that, we can estimate our hierarchical IRT model. Note that the information contained in the hierarchical predictors means that we no longer need to constrain the latent alignments of two countries to identify our model!

# fit IRT model
irt_hier <- MCMCirtHier1d(datamatrix = data_mat_hier, Xjdata = xjdata)

We can make a quick coefficient plot for the hierarchical predictors.

# create ggmcmc object and subset to hierarchical predictors
irt_hier_dat <- ggs(irt_hier)

# coefficient plot of hierachical predictors
ggs_caterpillar(irt_hier_dat %>% filter(grepl('polyarchy|trade', Parameter)), line = 0) +
  scale_y_discrete(labels = c('Polyarchy', 'Trade'))+
  theme_bw()

We can also extract the means of the estimates of our latent alignments just like with the regular IRT model.

# take mean and 95% quantiles by country, trim names, and sort by mean
irt_hier_gg <- plyr::ddply(irt_hier_dat, 'Parameter',
                      function(x) c(mean(x$value),
                                   quantile(x$value, probs = c(.025, .975)))) %>%
  filter(!grepl('beta', Parameter)) %>% 
  mutate(Parameter = gsub('theta.', '', Parameter)) %>%
  rename(mean = V1, lo = `2.5%`, hi = `98%`) %>%
  dplyr::arrange(mean) %>%
  mutate(Parameter = factor(Parameter, levels = as.character(Parameter)),
         panel = as.numeric(cut(1:n(), breaks = 2)))

# plot estimates
ggplot(irt_hier_gg, aes(x = Parameter, y = mean, ymin = lo, ymax = hi)) +
  geom_linerange() +
  geom_point() +
  coord_flip() +
  facet_wrap(~ panel, scales = 'free') +
  labs(x = '', y = '') +
  theme_bw() +
  theme(axis.text.y = element_text(size = 5))

Obviously some countries are missing because we dropped observations missing hierarchical predictors. Notice that now the United States is at the bottom of the scale while Sudan is at the top This is a perfect illustration of the rotational invariance problem in IRT models. We forced the United States to be positive and Cuba to be negative in the first model, but not in this model. However, the countries at each end of the spectrum are largely consistent, which suggests that both identification strategies are valid and neither seriously biases our alignment estimates. To make these estimates directly comparable, all we have to do is flip the sign on the second set of estimates.

# join estimates from each model, flip hierarchical sign, and plot
irt_hier_gg %>%
  mutate(mean_hier = -mean) %>% 
  select(Parameter, mean_hier) %>%
  full_join(irt_gg %>% select(Parameter, mean)) %>%
  ggplot(aes(x = mean, y = mean_hier)) +
  geom_point() +
  labs(x = 'IRT Model', y = 'Hierarchical IRT Model') +
  theme_bw()

It looks like the estimates from each model are relatively consistent! This means that we can make inferential claims about the relationship between our hierarchical predictors and the alignment of countries without worrying that the predictors driving our measured alignments Remember that the sign flip we just performed on the alignment estimates also applies to hierarchical predictor estimates, so the negative effect of polyarchy is really a positive when the United States is at the top of the scale.

Further Extensions

You can estimate a dynamic IRT model with the MCMCdynamicIRT1d_b() function in MCMCpack where the mean of each subject’s latent construct at time \(t\) is function of the value at time \(t-1\). You can also use the grm() function in the ltm package to estimate an IRT model with ordinal instead of binary observed indicators. Unfortunately, your off the shelf options for models that combine these functionalities is limited. The more complex the model you want to use, the more likely it is that you’ll have to write it yourself. For example, Fariss (2014) uses a dynamic IRT model written in JAGS which incorporates binary, ordinal, categorical, and continuous observed indicators to measure respect for human rights in various countries across time.

References

Bailey, Michael A., Anton Strezhnev, and Erik Voeten. 2017. “Estimating Dynamic State Preferences from United Nations Voting Data.” Journal of Conflict Resolution 61 (2): 430–56.

Coppedge, Michael, John Gerring, Staffan I. Lindberg, Svend-Erik Skaaning, Jan Teorell, David Altman, Frida Andersson, Michael Bernhard, M. Steven Fish, and Adam Glynn. 2017. “V-Dem Codebook V7.”

Fariss, Christopher J. 2014. “Respect for Human Rights Has Improved over Time: Modeling the Changing Standard of Accountability.” American Political Science Review 108 (2): 297–318.

World Bank. 2018. “World Development Indicators 2018.” Washington, D.C.