Why Do Couples Break Up? Dot Plots in R

Not to be read as “couples break up because of dot plots in R.” Although…surely at least once?

Let’s look at dot plots in ggplot2, using one of my favorite subjects: other people’s sex lives.

The Humble Dot Plot

If I learned one thing from Data Visualization: A Practical Introduction, it’s the virtuosity of the dot plot.

Don’t get me wrong, I appreciate a good bar graph. But each additional series makes things cramped, and then error bars arrive and suddenly there’s way too much happening.

Take a look at the same data visualized both ways:

The code for this is pretty minimal – once the data is properly formatted, which is a big caveat.1 The dot plot alone is only a few lines:

ggplot(static_data,
       aes(x = as_factor(x), y = y,
           ymin = low, ymax = upp,
           color = as_factor(z))) +
  geom_pointrange(alpha = 0.8, position = position_dodge(width = 0.5), show.legend = FALSE) +
  scale_colour_manual(values=palette) +
  labs(x = "", y = "",
       title="Light and Breezy") +
  theme_minimal()

My Friend NATSAL2

Let’s look at some real data. Why do people break up?

Happily, this isn’t just a question for nosy gossips. The British government is also interested in this, at least enough to ask about it during their National Survey of Sexual Attitudes and Lifestyles (NATSAL). NATSAL is a detailed survey of sexual behavior, conducted approximately every ten years in the United Kingdom. It’s a huge, representative, stratified, all-star government survey.

The 2000 wave included a relationship history, which included asking why relationships ended. The responses offered were:

  • Unfaithfulness/adultery
  • Money problems
  • Difficulties with sex life
  • Different interests/nothing in common
  • Grew apart
  • Not having children
  • Lack of respect or appreciation
  • Domestic violence
  • Arguments
  • Not sharing household chores enough
  • Moved because of change in circumstances
  • Death of partner
  • Other

Interviewees could name as many reasons as they wanted.3 If they said “other”, the interviewer would offer an additional ten reasons. I’m going to ignore those, since anyone who didn’t choose “other” didn’t hear them. I’m also going to ignore death, as it ends a relationship but isn’t exactly a breakup.

The full code is lengthy, but the general idea is:

  1. Filter down to each person’s most recent breakup, ignoring people who don’t report any.
  2. Assign each reason a proportion of “blame,” acting under the assumption that the first reason mentioned was most important, second reason second most important, and so on.4
  3. Use svryr to summarize the data while accounting for the weighted nature of the survey.5
  4. Gracelessly pull apart the data and smush it back together in a format ggplot2 can understand.
  5. Make a picture! For each reason, plot the percentage of breakups that mention it and, for the breakups that mention it, the average amount of blame assigned.6

The visualization code itself is succinct:

  palette <- c("#0072B2", "#F0E442")
  ggplot(data,
         aes(x = reason, y = stat,
             ymin = low, ymax = upp,
             color = as_factor(stat_name))) +
    geom_pointrange(alpha = 0.8, position = position_dodge(width = 0.5)) +
    coord_flip() +
    theme_minimal() +
    theme(legend.position = "top", legend.title = element_blank()) +
    scale_y_continuous(breaks = seq(0, 0.6, by=0.1), labels=seq(0, 60, by=10)) +
    scale_colour_manual(values=palette) +
    labs(x = "", y = "")

What is this saying?

A couple of things jump out:

  • People most often name growing apart, adultery, and arguments as reasons for breaking up.
  • Of all the reasons, adultery is the one that gets the highest proportion of blame.7
  • A few other reasons – circumstances, not having children – aren’t mentioned often, but when they do happen, they’re important.
  • Not sharing housework gets the least blame, on average. I’d speculate this is a result of the way I calculate blame, where it drops significantly as people mention more reasons. I’d guess that housework isn’t often mentioned alone, that it overlaps with reasons like arguments and lack of respect or appreciation, which then take significant shares of blame. Similarly, I was surprised violence didn’t get a higher share of blame, but it may also tend to appear as one of several reasons.

Finally, since all research ends with a call for more research, what questions does this bring up?

With this same filtered dataset, we could investigate further if my speculation about violence is correct, if it’s more important than in appears here. I could tweak the blame calculation, or could limit the data to only the first reason given for the breakup and see what floats to the top.

With additional NATSAL data, we could look at earlier relationships. Do people break up for the same reasons over and over?

One thing NATSAL can’t give us: both sides of the story. Interviewing both ex-partners, getting “dyadic” data, would let us compare their answers. This would be fascinating, although dyadic data is challenging to collect, even for existing couples – nevermind trying to track down everybody’s exes. Do I think we broke up for the same reasons you think we broke up?

  1. In this case, it’s sample data, which is easy. Usually it isn’t. Full code here. ↩︎
  2. Macdowall, W., Nanchahal, K., Fenton, K., Copas, A., Carder, C., Senior, M., Wellings, K., Ridgway, G., Russell, M., National Centre for Social Research, McCadden, A. (2005). National Survey of Sexual Attitudes and Lifestyles, 2000-2001. [data collection]. UK Data Service. SN: 5223, DOI: 10.5255/UKDA-SN-5223-1 ↩︎
  3. NATSAL is conducted in person, with interviewers visiting people in their homes, yet it asks about subjects that people may be hesitant to discuss with a stranger. Part of it is conducted on a computer – it sounds like the interviewer basically hands the respondent a tablet. The breakup questions are in the in-person section of the interview, but the interviewee is handed a card with the reasons listed out, each with a code letter, and asked only to say the letter aloud. ↩︎
  4. Score is based on the reason’s “rank” over the summed ranks:
    score = (total_number_of_reasons - rank_of_reason + 1) / (sum_of_ranks) For example, a breakup with three reasons will assign them scores of 0.50, 0.33, and 0.17. Full calculation here. ↩︎
  5. Individuals are assigned weights based on their demographics, so groups that are underrepresented in the survey are weighted more heavily. Calculations that incorporate these weights then get closer to representing the nation, not just the set of people who responded to the survey. ↩︎
  6. Showing both of these on the same plot is a slightly questionable choice. These two series use the same scale, 0 to 100%, so they aren’t subject to problems like different baselines, but they do represent different concepts. ↩︎
  7. This fits with other literature on infidelity, which shows very high disapproval of infidelity (97% of American adults) but also relatively high rates of it (20-30%). Those statistics are from Campbell, Kelly and David W. Wright. 2010. “Marriage Today: Exploring the Incongruence Between Americans’ Beliefs and Practices.” Journal of Comparative Family Studies 41(3):329-345. While NATSAL is a UK study and covers live-in relationships, not marriage, the general point is not especially controversial. For evidence that non-marital infidelity also receives high disapproval, see Thompson, Ashley E., and Lucia F. O’Sullivan. 2016. “Drawing the Line: The Development of a Comprehensive Assessment of Infidelity Judgments.” The Journal of Sex Research 53(8): 910–926. ↩︎