Data Visualisation and Open Source Technology 

Working Group & Project Scope

Working Group Leads 

Hanming Tu

Mike Stackhouse 

Data Visualisation and Open Source Technology aims to support, address, and answer pertinent questions around Data Visualisation and Open Source Technology. The combination of these two subjects is natural in today’s environment given the powerful Data Visualisation tools within the Open Source languages available today. Some of the questions, amongst others, that we intend to address are:

  • How do you safely use Open Source languages for analytics and submissions within a Regulatory environment?
  • What are the potential uses of Open Source software within a company outside of data analysis for a submission?
  • How can interactive visualisations be leveraged appropriately within a clinical environment?
  • What are the best practices for creating powerful interactive visualisations?


Best Practices for Interactive Analyses for Decision Making Submissions

Xiangyun 'Sharon' Wang

Zachary Skrivanek


Leverage Visual Analytics in our submission package to improve interactions between regulatory agencies and sponsors for a fast and effective review of submitted data and analyses.

Clinical Statistical Reporting in a Multilingual World

Mike Stackhouse

Michael Rimler

Several discrepancies have been discovered in statistical analysis results between different programming languages, even in fully qualified statistical computing environments. Subtle differences exist between the fundamental approaches implemented by each language, yielding differences in results which are each correct in their own right. The fact that these differences exist causes unease on the behalf of sponsor companies when submitting to a regulatory agency, as it is uncertain if the agency will view these differences as problematic. Understanding the agency’s expectations will contribute significantly to enabling the broader adoption of multiple programming languages in the production of data submission packages for regulatory review.

The Clinical Statistical Reporting in a Multilingual World project seeks to clearly define this problem and provide a framework for assessing the fundamental differences for a particular statistical analysis across languages. In this context, the risk of interpreting numerical differences in analysis results due solely to differences in programming language can be mitigated, instilling confidence in both the sponsor company and the agency during the review period. This will be accomplished by:

  1. Identifying common statistical analyses performed during submissions to narrow the scope of where discrepancies must be identified (e.g., continuous summaries, frequency counts, hazard models, bioequivalence testing, steady-state assessments, bioavailability testing, ANOVA)
  2. Providing necessary documentation to produce equivalence in results between separate statistical analysis software packages/languages (where possible)
  3. Evaluating and documenting differences in results between popular statistical analysis implementations as use cases
  4. Provision of sample code for use cases through a publicly accessible code repository for both review and consumption
  5. Promoting the notion that the ‘right’ implementation of a particular statistical analysis should be based sound statistical reasoning and not limited by the capabilities of a specific programming language or statistical analysis software package, nor its default settings

GPP in Macro Development:

Virginia (Ginny) Redner

Mark Foxwell

Macros provide an effective way to automate and reuse code in a standard and consistent manner across SAS programs. This ability to reuse code means that the use of GPP is particularly important in macro code and we think that there is a need to develop a consensus and document good programming practices specifically for macro programming. This project intends to create a guideline/White Paper for creating well-structured and precisely documented macro code that will be easy to read and maintain over time. The proposed White Paper will primarily describe:

  • Coding style of macros

  • Best practices while writing macros

  • Structured documentation of macros

  • Optimisation and saving compiling time while using macros

  • Refactoring in macros

Julia Initiative for PV Compliance

Hanming Tu and Chris Hurley

Explore the Julia language for standard analysis using PV compliance as example

Open Source Technologies for Regulatory Submissions

Eli Miller

While interoperability and standardisation have been goals of the Pharmaceutical Data Science industry for years, much of the work to create and validate a submission package is done manually or with proprietary software. Integrating tabular study data, study metadata, STF data, and visualisations are low-hanging fruit for a collaborative industry solution. Open-source tools have matured in their reliability and flexibility. This paper will explore their emerging use in regulatory submissions. The topics of tools that assist in creating a submission package, the cost of these tools, and the necessary controls and validation needed to create and maintain a complaint eCTD package will be discussed

Repository Governance and Infrastructure

Hanming Tu 

Repository governance and infrastructure

R Package Validation Framework

Ellis Hughes

There is consensus across industry that software used for regulatory submission needs to be validated. However, there are few industry accepted guidelines on what is specifically required to meet the bar of validation for Open Source tools and user-contributed extensions, specifically R and R packages.

There are two deliverables from this project: a White Paper and an R package. The White Paper will serve as a reference for industry on how to perform validation for user-contributed extensions of programming software. It will detail the elements that need to be met for the extension to be validated, and ways to document the process in a reusable, efficient, and sharable fashion. The R package will be developed to provide the tools and guidance for validating an R package. It will show how to take advantage of the tools that exist in the R language, and it will be based on the White Paper to ensure the baseline requirements are achieved.

With these tools, there will be a standard to reference on how to approach validation of extensions to the R language. Additionally, by demonstrating how to perform validation, the framework can be more generally applied to software extensions in other languages

R-shiny Interactive Forest Plot in Collaboration with ASA


Bryant Chen

Dhananjay Chhatre 

Melvin Munsaka

Development of R-shiny application(s) to enable the generation of identified plots for direct inclusion in submission packages for regulatory agencies. Initial scope is to develop tools to generate forest plots for inclusion in submissions to the FDA. This work is in conjunction with the American Statistical Association (ASA)

Test Dataset Factory 

Dante Di Tommaso

Several Working Group projects develop and specify medical research methods, features, or processes, and some even create software components or sub-systems for common tasks in drug development. As part of these efforts, a variety of SDTM or ADaM test datasets are required. The typical fallback position of project teams is to use data from the CDISC pilot project and/or anonymised study data that are provided by project team members. The Test Dataset Factory project aims to provide test data formatted in SDTM and ADaM that support a more systematic and comprehensive testing of these concepts and scripts.



Thinking of joining PHUSE?

Already a member but not sure how you can benefit?

PHUSE is an expanding, global society with a global membership of clinical data scientists. It requires a large pool of resources to help with its running, and so there are many opportunities for members to become involved. Whether it's chairing a conference, presenting at an event, leading a working group or contributing to the quarterly online newsletter, we are always keen to hear from volunteers.

Find Out More