Post-marketing surveillance of FDA-regulated products is critical in identifying potential adverse events (AEs) in the real-world population. An infrastructure leveraging near real-time data to facilitate early signal detection may aid the FDA’s mission in continual assessment of products’ risk profiles.
Our goal is to establish an active surveillance system that collects, annotates, standardizes, evaluates and presents data from public open sources using advanced data science technology to enhance the FDA’s ability for early safety signal detection.
We propose using an artificial intelligence (AI) based approach to detect early safety signals from social media sites such as Reddit, Twitter and WebMD as AI techniques excel at extracting meaningful patterns from large volumes of ambiguous data. To augment the AI-based detection system, signals detected from social media data can be evaluated in the context of other data sources, such as the FDA reporting systems FAERS and VAERS.
We will establish an infrastructure to mine social media data for safety signals in near real time via the following steps:
– Select social media sites and collect data where potential AEs are reported.
– Extract language fragments from the sample data using natural language processing techniques.
– Annotate keywords from the sample data into standardized AE terminologies, supplemented with keywords from FAERS, VAERS.
– Build an extensive list of keywords for AEs from the initial list by applying Word embedding techniques to a large sample of social media data.
– Build a supervised ML model for determining potential safety signals.
– Aggregate and present the model results in a dynamic interpretable dashboard with geographic and demographic information.
This infrastructure will likely complement the FDA’s current surveillance networks, enhancing the early detection of safety signals warranting further investigation and systematic examination.