PHUSE/FDA Data Science Innovation Challenge – Opioid/Substance Use Disorder

A current challenge of the PHUSE/FDA Data Science Innovation Challenge is 'Opioid/Substance Use Disorder' which looks to facilitate effective intervention for prevention and treatment of opioid/substance use disorder by harnessing social media data. Here you will find the abstracts of the accepted participants who have proposed a solution to this challenge.

Transfer Learning for Opioid usage monitoring
Submitted by: Mohit Juneja

A key challenge in the healthcare machine learning modelling is the lack of labeled datasets. Transfer learning – where models trained on a particular domain can be customised for other domains – has been successfully applied across a set of NLP tasks to achieve state-of-the-art performance. Our solution uses transfer learning along with customised limited data labelling to achieve highly accurate results. Apart from transfer learning, we use deterministic pattern matching techniques for variables that cannot be modelled using transfer learning. 

Our solution extracts the following set of attributes:

  1. variables extracted via transfer learning: Gender (M/F), Pregnancy Status (Yes/No), ADHD (Yes/No), Depression (Yes/No), Adverse Events, Anatomy, Tests Conducted
  2. variables extracted via deterministic pattern matching: Age, Opiates (Names), Prescription Drugs (Names)

Our solution provides the following capabilities:

  1. longitudinal records of social media users based on the comments, and the dates of the comments
an interactive dashboard for analysis and visualisation of the results of the NLP model


Maternal Opioid network Map (Mom)
Submitted by: Leanne Goldstein

Maternal opioid use is an increasing concern in public health. The Centers for Disease Control and Prevention (CDC) report that maternal opioid use, as discovered through neonatal abstinence syndrome (NAS), increased more than four times from 1999 to 2014. Efforts are increasing to address this national concern among CDC and partners, including MATernaL and Infant NetworK (MAT-LINK), to understand outcomes associated with treatment of opioid use disorder during pregnancy and guidance regarding how to treat patients with opioid use disorders during pregnancy.

However, accurately identifying maternal opioid users in a timely manner still presents a challenge as patients are unlikely to self-report this behaviour due to large societal stigmas and legal implications of child abuse. This creates a difficulty in identifying maternal opioid use patterns, for both traditional and innovative surveillance approaches. Here we explore a data science solution of using social media to supplement existing material opioid use surveillance. While there are limitations, social media analysis can successfully be used to identify institutions which are also monitoring the public health concern, and whose insights can be used to collaboratively generate solutions. It can also be used to monitor correlative factors of opioid use, such as reports of use of tobacco, alcohol, benzodiazepines, cocaine and other substances of abuse, intersected with reports of pregnancy and babies.

Our solution evaluates the use of Twitter for monitoring maternal opioid use. Twitter has been reported successful in medical literature as a public health and pregnancy surveillance tool. We selected Twitter as this data science solution because it is scalable and agile for the rapidly changing needs of the medical community examining maternal opioid use. Our Twitter analysis aims to reveal the network of institutions monitoring maternal opioid use and patterns in the intersection of search on maternal and opioid terminology. Our solution will use terminology validated in medical literature and can be used in conjunction with other methods of research, such as literature searches, to help narrow scope and create collaborative prevention and treatment solutions. 


Deep social media mining tool to monitor opioid use of female pregnant patients

Submitted by: Jyotiska Biswas

We propose building an intelligent system that examines social media users by monitoring public posts (e.g. tweets, shares, replies) through a system of various state-of-the-art (deep-learning-based) AI triggers to identify and assign risk scores to users (e.g. patients) and to deeply study opioid usage among pregnant women, to take preventative actions for at-risk users, i.e. relapsing, abuse, addiction, and susceptibility to use.

While Twitter and Reddit are great sources for studying patients with opioid usage and substance abuse, identifying the right users who pose the risk of harmful side effects from opioids or addiction is extremely challenging. This is primarily due to the habits and style in which users post content online – social media posts are generally unreliable and full of false information that has zero relevance to the study – hence, traditional text mining tools don’t work. Furthermore, traditional social media analytics tools focus on branding and impressions instead of case studies. In fact, numerous case studies conducted by health agencies (e.g. for studying epidemics, adverse reactions, etc.) from Twitter data have failed to reach any conclusive findings due to the vast noise that floods social media feeds.

To save the time-to-operation and the cost of hiring data scientists and engineers, we can reduce the numerous challenges of building a data mining engine by configuring our COTS product ThinkTrends Social. ThinkTrends ( is a data mining engine that can be configured to capture and study structured and unstructured data (e.g. social media), with easy-to-use data labelling, business intelligence, and AI & DL automation tools all in one place. ThinkTrends Social is a custom version with in-built social connectors (i.e. Twitter and Reddit). This creates an ideal data science environment to conduct case studies from real-time social media data, and freely build/share/replicate deep learning models to filter the junk.

The initial phase will use two forms of AI/deep learning: computer vision to detect from images and videos on social media (i.e. detect stages of pregnancy from public posts, and recognise facial sentiments from photos); and by utilising the latest natural language understanding (NLU), we can train AI to learn patient attitudes, vulnerability to relapse, intents, and authenticity of Twitter posts by analysing against past posts/retweets. The AI triggers will categorise sentiments, emotions, and intents from patients into potential risk factors from public posts. Ultimately, the analyst will use dashboards to cohesively study posts and quickly find at-risk users/patients to suggest preventive measures well ahead of time if needed.


Mosaic Portrait of a Persona – Collecting Topic-Specific Social Media Data on Target Population to Piece Together a Persona of Pregnant Women and the Link to Opioid Usage

Submitted by: Jack Slattery

Our research centers around developing an approach to efficiently and effectively gather data on specific topics of the target population from social media platforms, in this case collecting social media data on the use of opioids by pregnant, post-partum, and reproductive-age women. Due to the sparsity of this data and the difficulty in determining the target population from bigger social media platforms including Facebook and Twitter, we introduce an AI-based information retrieval algorithm that is suitable for identifying relevant posts from social media. We train the algorithm using Reddit data from the subreddits related to opioid use posted by those who have also participated in the discussions in subreddits related to pregnancy and babies as an ap. Reddit discussions are organized into user-created areas of interest, the so-called "subreddits", which allows us to identify data on specific topics of the target population from their areas of interest.

This algorithm will allow us to collect data on opioid use by pregnant women from any social media platform. In addition to developing the approach to collecting the right data, we will also present our processes and our NLP and machine learning models that are used to recognize named entities and to classify posts into categories of the initiation of drugs, the transition to substance use disorder, and the treatment and relapse of drug use, as well as identifying correlating factors, and conducting sentiment analysis to evaluate the effectiveness of treatments. Insights obtained from mining the social media data can be cross-validated with insights from non-social media data, such as CDC, CMS, EHR and other data sources, and can inform decisions on the prevention of drug abuse and effectiveness of treatments. Since some social media data includes geolocation, mining social media data provides the additional benefit of reducing time to identify highly impacted areas. It can reduce the costs by making more informed decisions on where and how to allocate resources to help this crisis. We will discuss potential future steps in the domain.

Challenge Stream Chairs:

James R. Johnson, PhD, Mitra Ahadpour and Catherine Li


Posted by on

Categories: Events Travel Across the Globe

Related Blogs

Add Your Comments

Thinking of joining PHUSE?

Already a member but not sure how you can benefit?

PHUSE is an expanding, global society with a global membership of clinical data scientists. It requires a large pool of resources to help with its running, and so there are many opportunities for members to become involved. Whether it's chairing a conference, presenting at an event, leading a working group or contributing to the quarterly online newsletter, we are always keen to hear from volunteers.

Find Out More