Facebook posts reveal your hidden illnesses, say researchers

Does your stomach hurt? Do you tell your friends on Facebook?

If so, researchers suggest there’s a possibility you might be suffering from depression, and there’s a good chance that you could be diagnosed months earlier if they were to analyze your social media posts than if they just went by clinical diagnosis alone.

In a study from Penn Medicine and Stony Brook University that was published in PLOS ONE, researchers claim that they can diagnose someone based on their social media posts, given that the language people use can point to conditions such as diabetes, anxiety, depression and psychosis.

In their paper, the researchers described using natural language processing to analyze 949,530 Facebook posts made by 999 study participants, for a total of 20,248,122 words.

They looked for markers of 21 medical conditions, and they found that all of them were predictable from Facebook language beyond mere lucky guesses. Some of those medical conditions were particularly easy to predict, using a combination of demographics and Facebook language vs. just going by demographics alone: namely, diabetes, pregnancy, anxiety, psychoses, and depression.

One example of how language can strongly predict a diagnosis is alcohol abuse. Alcohol abuse was marked by use of the words “drink,” “drunk,” and “bottle,” they said. That’s a pretty intuitive diagnosis, but other predictions weren’t so obvious: for example, people who use the words “god,” “family” and “pray” are 15 times more likely to have been diagnosed with diabetes.

Other correlations:

  • Use of hostile language – e.g. “people,” “dumb,” “bulls**t,” “b**ches” – was a predominant marker associated with drug abuse as well as psychoses.
  • Those suffering from depression tend to use words associated with the physical symptoms of anxiety – “stomach,” “head,” “hurt” – and with emotional distress – “pain,” “crying,” “tears.”

Should you offer insulin to somebody who mentions praying and God? No, the researchers say: clearly, not everyone mentioning the words they tracked has a particular medical condition. Rather, those mentioning key words are more likely to have a given, correlated condition, they said.

No, your doctor won’t be e-stalking you

The researchers say that a helpful thing about social media is that it’s a two-way communication channel: it gives clinicians a built-in way to talk with patients. That doesn’t mean that they’ll be eavesdropping on your posts all the time, but given their research, they think it would make for effective models to treat patients who opt-in to a system of patients allowing clinicians to analyze their social media writings.

At any rate, Facebook is already eavesdropping, at least with regards to detection of suicidal thoughts. In September, the platform explained how, in the previous year, it had started to use machine learning to look for such thoughts in users’ posts.

Facebook’s post about the AI use, written by Catherine Card, Director of Product Management, is an interesting read, as it spells out the difficulties of teaching a machine linguistic nuance. For example, how do you give AI enough contextual understanding to glean that “I have so much homework I want to kill myself” isn’t a genuine cry of distress?

Facebook made a breakthrough when it realized that it could use false alarms as a training set. It had such a collection: in 2015, it introduced new ways for users to flag their friends’ suicidal notes. The posts were reviewed by humans – trained Community Operations reviewers – to determine if the writer were actually at risk of committing self-harm. Whatever posts the humans found had been incorrectly flagged as suicidal gave Facebook more data with which to more precisely train the classifiers used to determine accurate suicidal expressions.

But the Penn researchers aren’t advocating for an expansion of Facebook as an AI Big Brother that scans all our posts with or without our say-so. Rather, their work shows that an opt-in system for patients who agree to having their social media posts analyzed could provide extra information for their healthcare teams to use in refining their medical care.

Lead author Raina Merchant, the director of Penn Medicine’s Center for Digital Health and an associate professor of Emergency Medicine, told Science Daily that her team’s recent work builds on a previous study that showed that analysis of Facebook posts could predict a diagnosis of depression up to three months earlier than a clinical diagnosis. She said that it’s tough to predict how widespread an opt-in social media post analysis system would be, but that it could be useful for patients who are frequent social media users:

For instance, if someone is trying to lose weight and needs help understanding their food choices and exercise regimens, having a healthcare provider review their social media record might give them more insight into their usual patterns in order to help improve them.

Ever mention donuts in your posts? One imagines that information could come in handy.

Similar to how Facebook now allows users to flag posts within their network that they think may suggest suicidal ideation, the researchers suggest that clinicians could get early warnings about a broader set of conditions, they said:

A patient-centered approach [similar to Facebook’s suicide filters] could be applied to a broader set of conditions allowing individuals and their networks (for those who opt-in) to have early insights about their health-related digital footprints.

Privacy, informed consent, and data ownership

If the researchers are correct in claiming that you can make a diagnosis from public social media posts, then this is a great illustration of how much information people are sharing without being aware of it. The researchers make that exact point, in fact, pointing to the questions about privacy, informed consent, and data ownership that their work raises.

The extra ease with which social media access can be obtained creates extra obligations to ensure that consent for this kind of use is understood and intended. Efforts are needed to ensure users are informed about how their data can be used, and how they can recall such data. At the same time, such privacy concerns should be understood in the context of existing health privacy risks. It is doubtful that social media users fully understand the extent to which their health is already revealed through activities captured digitally.

The issue is that people don’t always understand that the whole is greater than the parts. We all might think we’re sharing little snippets that don’t amount to anything particularly revealing, but when we think that way, we miss the fact that a million little snippets add up to a very Big Data picture.

But we should also bear in mind that the more data you have, the more spurious correlations it will contain. As the researchers said, just because you use a given set of words doesn’t mean that you’re alcoholic/diabetic/depressive/pregnant/a drug abuser.

Sometimes, a cigar is just a cigar.