Data Visualisation and Open Source Technology in Clinical Research

Working Group & Project Scope

Working Group Leads 

Hanming Tu

Mike Stackhouse 

Data Visualisation and Open Source Technology in Clinical Research (DVOST) aims to support, address, and answer pertinent questions around Data Visualisation and Open Source Technology in the Pharmaceutical Industry. The combination of these two subjects is natural in today’s environment given the powerful Data Visualisation tools within the Open Source languages available today. Some of the questions, amongst others, that we intend to address are:

  • How do you safely use Open Source languages for analytics and submissions within a Regulatory environment?
  • What are the potential uses of Open Source software within a company outside of data analysis for a submission?
  • How can interactive visualisations be leveraged appropriately within a clinical environment?
  • What are the best practices for creating powerful interactive visualisations?


Open Source Technologies for Regulatory Submissions

Eli Miller

While interoperability and standardisation have been goals of the Pharmaceutical Data Science industry for years, much of the work to create and validate a submission package is done manually or with proprietary software. Integrating tabular study data, study metadata, STF data, and visualisations are low-hanging fruit for a collaborative industry solution. Open-source tools have matured in their reliability and flexibility. This paper will explore their emerging use in regulatory submissions. The topics of tools that assist in creating a submission package, the cost of these tools, and the necessary controls and validation needed to create and maintain a complaint eCTD package will be discussed

Repository Governance and Infrastructure

Hanming Tu 

Repository governance and infrastructure

Julia Initiative for PV Compliance

Hanming Tu and Chris Hurley

Explore the Julia language for standard analysis using PV compliance as example

R Package Validation Framework

Ellis Hughes

There is consensus across industry that software used for regulatory submission needs to be validated. However, there are few industry accepted guidelines on what is specifically required to meet the bar of validation for Open Source tools and user-contributed extensions, specifically R and R packages.

There are two deliverables from this project: a White Paper and an R package. The White Paper will serve as a reference for industry on how to perform validation for user-contributed extensions of programming software. It will detail the elements that need to be met for the extension to be validated, and ways to document the process in a reusable, efficient, and sharable fashion. The R package will be developed to provide the tools and guidance for validating an R package. It will show how to take advantage of the tools that exist in the R language, and it will be based on the White Paper to ensure the baseline requirements are achieved.

With these tools, there will be a standard to reference on how to approach validation of extensions to the R language. Additionally, by demonstrating how to perform validation, the framework can be more generally applied to software extensions in other languages

GPP in Macro Development:

Virginia (Ginny) Redner

Mark Foxwell

Macros provide an effective way to automate and reuse code in a standard and consistent manner across SAS programs. This ability to reuse code means that the use of GPP is particularly important in macro code and we think that there is a need to develop a consensus and document good programming practices specifically for macro programming. This project intends to create a guideline/White Paper for creating well-structured and precisely documented macro code that will be easy to read and maintain over time. The proposed White Paper will primarily describe:

  • Coding style of macros

  • Best practices while writing macros

  • Structured documentation of macros

  • Optimisation and saving compiling time while using macros

  • Refactoring in macros

R-shiny Interactive Forest Plot in Collaboration with ASA


Bryant Chen

Dhananjay Chhatre 

Melvin Munsaka

Development of R-shiny application(s) to enable the generation of identified plots for direct inclusion in submission packages for regulatory agencies. Initial scope is to develop tools to generate forest plots for inclusion in submissions to the FDA. This work is in conjunction with the American Statistical Association (ASA)

Best Practices for Interactive Analyses for Decision Making Submissions

Xiangyun 'Sharon' Wang

Zachary Skrivanek


Leverage Visual Analytics in our submission package to improve interactions between regulatory agencies and sponsors for a fast and effective review of submitted data and analyses.

Test Dataset Factory 

Dante Di Tommaso

Several Working Group projects develop and specify medical research methods, features, or processes, and some even create software components or sub-systems for common tasks in drug development. As part of these efforts, a variety of SDTM or ADaM test datasets are required. The typical fallback position of project teams is to use data from the CDISC pilot project and/or anonymised study data that are provided by project team members. The Test Dataset Factory project aims to provide test data formatted in SDTM and ADaM that support a more systematic and comprehensive testing of these concepts and scripts.



Thinking of joining PHUSE?

Already a member but not sure how you can benefit?

PHUSE is an expanding, global society with a global membership of clinical data scientists. It requires a large pool of resources to help with its running, and so there are many opportunities for members to become involved. Whether it's chairing a conference, presenting at an event, leading a working group or contributing to the quarterly online newsletter, we are always keen to hear from volunteers.

Find Out More