PHUSE/FDA Data Science Innovation Challenge – Drug Safety Surveillance Challenge Abstracts

A current challenge of the PHUSE/FDA Data Science Innovation Challenge is 'Drug Safety Surveillance' which looks to define an ideal process for surveillance of multiple data sources that would be useful in the development of a pharmacovigilance tool. Here you will find the abstracts of the accepted participants who have proposed a solution to this challenge.

Datacise ™ Integrated Safety Explorer
Submitted by: Eric Harvey

Pharmacovigilance is entering a new era in data sourcing for surveillance. The ability to capture safety data has grown many fold along with the opportunity to benefit from aggregated safety knowledge bases. Nonetheless, most of the life sciences and regulatory agencies do not fully tap into this ever-growing safety knowledge base. It’s our opinion that they don’t have the luxury to allocate enough human resources to answer any questions beyond those most critical to safety.

To begin with, MMS searched for a worldwide unified registry for reporting adverse reactions to drug and biologic products. It was determined that such data is currently fragmented or unavailable and is suspected to be under reported. There are many disconnected tools available which try to answer safety questions from multiple dimensions. MMS has taken a proactive approach to create integrated interactive data visualisations, which will allow quick comparisons between data from multiple traditional sources, as well as emerging sources like social media and genomics data.

For this challenge, we chose nine publicly available data sources that are relevant to the final goal. With advances in technologies it is easy to generate, source and parse safety data with a fair amount of data engineering knowledge. However, it takes a substantial amount of time for Safety Scientists or FDA reviewers to answer the question “why do we have a signal?”

Today’s analytics tools let end users generate static safety reports, interact with data via dashboards, or manually explore data in various formats. These tools can lend insight into a certain safety metric at a single point in time, but it is unclear and difficult to ascertain in real time with certainty why a safety metric is emerging or changing.

The MMS proprietary Datacise ™ platform Integrated Safety Explorer aims to:

  • Answer the ‘why’ behind a signal

  • Democratise Safety data in conjecture with emerging sources like social media and genomics


Submitted by: Deb Piaseczynski

PRA’s Social Intelligence & Communities team researches how patients, caregivers, and HCPs are talking about a specific disease. This complementary data allows us to truly understand the voice of the patient, which helps us learn how they describe their conditions, what their challenges and frustrations are, the type of information they seek and helps us identify behavioural trends and activities. Working in collaboration with data scientists, therapeutic experts and clinical operations, PRA uses real world data, study protocol and site/country lists to develop queries within our global social intelligence platform. The team monitors, captures and analyses large volumes of online conversations and consumer digital behaviour data.

PRAView harnesses the power of our multiple social and medical databases and, using AI-powered social listening and social platform monitoring, targets specific keywords related to public conversations about AEs, side effects, symptoms, and other physical and emotional trends among indication-specific audiences. This information is visualised via a real-time dashboard that provides demographics (age, gender, geography, ethnicity), emotions, sentiment, drivers and conversation volume.

This automated system provides global, multilingual, multicultural monitoring across multiple social platforms 24/7. For each target mention, we will provide a direct link to the actual post, a screenshot, time and report date and author information (if available) within 24-hours of posting. Monthly reporting by channel provide basic analytics while uncovering potential trends.


Submitted by: Azia Tariq

Quickly identifying adverse events for a newly launched drug can be critical to preventing further complications in patients and inform how a drug performs in the market after launch without the controlled conditions of a clinical trial. Typical channels for reporting AEs include FAERS which supports the FDA's post-marketing safety surveillance and MedWatch, the FDA’s medical product safety reporting program for health professionals, patients and consumers.

Traditional reports can take time and so there may be a gap from when the drug launches to market to when the FDA receives enough AE reports to make a decision for drug label change. These reports may also not account for patients that do not report an AE experience at all, leading to some AEs being underreported. For example, a drug that receives reports from medical providers may not account for AEs experience by individuals who opt not to see a doctor. They may, however, choose to ask their social circle via social media.

A proof of concept tool that utilises social media to fill the gap between launch and existing pharmacovigilance systems catching up (~6 months) has the potential to reduce this lag and allow the FDA to receive real-time information regarding a drug by scraping and analysing social media data. This combined with the current reporting systems will better enable the FDA to make informed decisions based on a comprehensive AE profile of the drug. The tool may flag potential new AEs that were not part of the clinical submission database/drug label, generating targets for further follow up using existing systems. Using an R-Shiny dashboard, we can build a tweets database based on each drug and help visualise these AE spikes occurring in various social media platforms.

This proof of concept system identifies many of the challenges that need to be faced in order to have a fully functional and scalable solution deployed on live-stream social media data. While development of the tool has been hindered by a lack of suitably tagged messages/datasets to train algorithms with in this bespoke task, we outline paths forward with the end objectives in mind. Given the limited data we have access to, our current proof of concept will present unacceptably high false positive rates when accounting for the volume of social media messages and the low proportion of them reporting drug related adverse events. The short-run solution we employ recommends a manual review of all system tagged messages until the system can be refined to achieve sufficient specificity. We also strongly encourage proliferation of the project to cross-industry collaboration to produce an adequate training/test set with multiple language support.


Data fusion and deep learning for pharmacovigilance 
Submitted by: Zhichao Liu and Weida Tong

Protect the safety of patients and consumers is the major mission of FDA. This solution is aimed to develop deep neural network models for post market drug safety surveillance. The data from FDA Adverse Event Reporting System (FAERS) will be collected and analysed for modelling adverse event after using a drug or therapeutic biologic product.

In addition, the adverse events will be mined from social media data using deep neural language models, which can discover adverse event from the texts posted by the patients and consumers. The adverse event will be detected, assessed and monitored using the deep neural network model to protect the public health.


Deep dissection of post market surveillance signal
Submitted by: Dong Wang and Weida Tong

Monitoring and acting on safety signals from post market surveillance is a critical component of FDA’s mission to protect patient safety. Databases like FDA Adverse Event Reporting System (FAERS) have been extensively analysed to mine safety signal to provide alerts for potential regulatory actions.

However, the current methods tend to focus on high reporting frequencies in general, while subpopulation focused analysis has not been sufficiently utilised. In this proposal, we will provide new statistical and data mining tools for the deep dissection of post market surveillance signal, so alert can be given for different sexes, age groups, and other subpopulations. This approach will not only provide means for more focused alert, it can also detect subpopulation specific signals that will be diluted in a general analysis.


A deep learning phylogenomic approach to advance post market safety surveillance of drugs and biologic products
Submitted by: Pere Puigbo

Artificial Intelligence (AI) systems can be trained to find statistical structures from large and complex datasets. These techniques are currently in high demand on several industries to analyse unstructured datasets from the Internet of things such as text, images, real-time and spatial data. Several AI systems, including machine learning and deep learning algorithms, can be used for market safety surveillance of drugs and biologic products. However, real-time systems present an NP-problem that should be approached with heuristic methods. Heuristic algorithms perform analyses much faster, but reduce precision and accuracy.

Here I propose a solution to speed up screening time by performing a comprehensive phylogenomics classification of potential targets of biological products. In other words, I will use phylogenetic trees to classify target proteins/genes and each clade of the tree will have risk score associated that may change over time. The proposed solution is divided in three work packages (WP) and the specific aims are the following:

WP1. Phylogenomics classification: to classify targets (genes and proteins from the KEGG database) of drugs and biological products using phylogenomics methods (including sequence analysis, phylogenetic trees and multifactorial analyses).

WP2. Microevolutionary analysis: to identify what kind of targets are more prone to safety issues and how it changes over time. I will use the phylogenetic trees from WP1 to determine how the safety score changes over time, in a controlled microevolutionary scenario.

WP3. Screening with deep learning: to find safety risks associated to drugs and biological products. The aim of WP3 is to determine how the reduction of data dimensionality (reduction of the number of targets through phylogenetic clustering) facilitates real-time monitoring and surveillance. The goal is to reduce the time of predictions, without losing predictive power.


Counterfeit Drug Detection
Submitted by: Xin Liu

Counterfeit drugs has cost the life of hundreds of thousands of people in the world every year. The traditional counterfeit drug detection solutions are manual, labour intensive, ineffective, costly and intrusive. Precise team has developed an innovative counterfeit drug detection solution that is based upon the latest all-spectrum image recognition and on-the-edge TPU-based machine learning technologies.

Our solution has been proven to be accurate, automatic, near real-time, low-cost and non-intrusive. In this challenge, we’d like to discuss the concept, the hardware design, the deep learning model for counterfeit drug detection built on AutoML, and the end-to-end process for image augmentation, image processing and ML model training.


Early Safety Signal Detection from Public Internet Sources
Submitted by: Jason Meyer

Post-marketing surveillance of FDA regulated products is critical for identifying potential adverse events (AEs) in the real-world population. An infrastructure leveraging near real time data to facilitate early signal detection may aid FDA’s mission in continual assessment of products’ risk profiles.

Our goal is to establish an active surveillance system that collect, annotate, standardise, evaluate and present data from public open sources using advanced data science technology to enhance FDA’s ability for early safety signal detection.

We propose to use an artificial intelligence (AI) based approach to detect early safety signals from social media sites, such as Reddit, Twitter, or Web-MD, as AI techniques excel at extracting meaningful patterns from large volume of ambiguous data. To augment the AI based detection system, signals detected from social media data can be evaluated in the context of other data sources, such as FDA Reporting Systems FAERS and VAERS.

We will establish an infrastructure to mine social media data for safety signals in near real time through following steps:

– Select social media sites and collect data where potential AEs are reported

– Extract language fragments from the sample data using natural language processing techniques

– Annotate key words from the sample data into standardised AE terminologies, supplemented with key words from FAERS, VAERS

– Build an extensive list of key words for AEs from the initial list by applying Word Embedding techniques on a large sample of social media data

– Build a supervised ML model for determining potential safety signals

– Aggregate and present the model results in a dynamic interpretable dashboard with geographic and demographic information

This infrastructure will likely complement FDA’s current surveillance networks, enhancing the early detection of safety signals warranting further investigation and systematic examination.


Challenge Stream Chairs:

Patricia Hegarty and Suranjan De

For more information visit

Posted by on

Categories: PHUSE News

Related Blogs

Add Your Comments

Thinking of joining PHUSE?

Already a member but not sure how you can benefit?

PHUSE is an expanding, global society with a global membership of clinical data scientists. It requires a large pool of resources to help with its running, and so there are many opportunities for members to become involved. Whether it's chairing a conference, presenting at an event, leading a working group or contributing to the quarterly online newsletter, we are always keen to hear from volunteers.

Find Out More