Impact of the COVID-19 Pandemic on the Collection and Analysis of Missing Data in Ongoing Clinical Trials

Missing Data Considerations for Clinical Data Scientists

Jaishri Alladi, Sangeeta Bhattacharya, Adam Czernik, Amy Gillespie, Ying Shi, Chris Wells


The following blog discusses risks, mitigations, monitoring, and reporting of missing data for clinical trials conducted during the COVID-19 pandemic. These considerations are intended for clinical data scientists and are based on current practice, regulatory guidance and literature review. It is our intent to share these points to help inform our industry colleagues and suggest some best practices during this unprecedented time. We caution, however, that efforts are quickly evolving, and we are learning more each day. We look forward to your feedback as we travel this journey together.

As noted in recent trial conduct guidance published by the US Food and Drug Administration, “Changes in study visit schedules, missed visits or patient discontinuations may lead to missing information (e.g. for protocol-specified procedures). It will be important to capture specific information in the case report form that explains the basis of the missing data, including the relationship to COVID-19 for missing protocol-specified information (e.g. from missed study visits or study discontinuations due to COVID-19). This information, summarised in the clinical study report, will be helpful to the sponsor and FDA” (FDA, 2020).

The following bullets describe some expected scenarios of missing data during the COVID-19 pandemic:   

  • Patient early discontinuation might cause the value of primary outcome to be missing.
  • Missed visits might cause missing assessments, no procedure performed, no lab sample collected, etc.
  • Drug supply delay might cause delayed or missed doses.
  • Patient reported outcomes might not be consistently recorded due to variant reasons related to the pandemic, including cancelled/delayed visits.
  • Paper diaries might not be returned to the site for data entry in time for DMC, or other decision-making.
  • Permanent data missing due to unresolved queries or unverified data.
  • Site closure or site staff not available for data entry, even though patients visited the site.

The remaining sections of this blog discuss some preventive actions to reduce missingness, innovative ways to monitor missing data and thoughtful approaches to report missing data. We close with some thoughts on how much missing data is too much.


Prevention of Missing Data and Mitigations to Consider

Mitigating the impacts of missing data is important in every trial. The pandemic, however, has the potential to increase the amount of missing data. As described in the EMA Guidance on the Management of Clinical Trials During the COVID-19 Pandemic, “Various challenges exist which result in restrictions of visits to healthcare facilities, increased demands on the health service and changes to trial staff availability. Trial participants may also be required to self-isolate, which can make it difficult for investigators to maintain their medical oversight. These challenges could have an impact on the conduct of trials, such as the completion of trial assessments, completion of trial visits and the provision of Investigational Medicinal Products” (EMA, 2020). As such, it is critical to carefully plan and implement strategies to minimise the likelihood of missing data as much as possible. However, any approaches must guarantee the safety of the participants involved. 

Strategies to address missing data have been widely documented. We list a few of those study conduct & data collection strategies below which are particularly relevant during the pandemic:

  • Reduce required visits and amount of data collected.
  • Expand visit windows.
  • Adopt data collection methods that don’t require face-to-face visits; use video visits.
  • Utilise home nursing when participants are unable to travel to the site.
  • Support the use of local laboratories.
  • Contact participants via phone or use telemedicine.
  • Enhance participant contact; keep participants engaged in the study with incentives, visit reminders, phone calls to monitor status.
  • Changes in data collection methods might result in a protocol amendment.

The mitigation of missing data in clinical trials is a team sport. All members of the clinical trial team involved in the design and execution of a clinical trial have a role to play in increasing retention and reducing missing data (O’Kelly, 2014, p. 40). A clinical data scientist is no exception. While many mitigation strategies to reduce the volume of missing data are operational and a focus of the clinical trial site staff, the clinical data scientist must also carefully understand and evaluate the mitigation strategies being considered. 

The clinical data scientist should provide input on the data collection strategies and feasibility of using the collected data for statistical analysis and reporting. The clinical data scientist has an important role to educate and inform the clinical trial team members on the use of the clinical data for downstream analysis needs. As an example, the clinical data scientist should evaluate the impact of using data collected from a local lab instead of a central lab. The clinical data scientist should ensure data collected using other means such as telemedicine for AE reporting are properly collected in the clinical trial database, keeping in mind the ability to utilise the data and, appropriately, for statistical analysis and reporting programs. (For example, verbatim and non-standard text reported in a comment field or bubble may be difficult to programmatically consume and therefore be ineffective for statistical analyses and reporting.) 

Besides providing input to the design and execution of a clinical trial to increase retention and reduce missing data, the clinical data scientist must remain flexible as new data presentations may be needed in-life to evaluate missingness and identify any potential shortcomings of the available data. The talents of a clinical data scientist to handle, manipulate, and synthesise the collected data in order to monitor the trial, build knowledge and inform the clinical trial team about the degree of missingness is critically important. 


Monitoring Missing Data for Ongoing Trials

With all reasons of missing data in mind, the methods of review of clinical trial data must be examined and mitigating actions put into place. This might mean the adjustment of existing processes or it might mean that the review of clinical data should be addressed in a new alternative way.

Protocol Deviations

The collection of protocol deviations should be reviewed. If it is the company practice to only report MAJOR deviations, then it is recommended to explore the MINOR deviations for any textual mention of anything relating to COVID-19, bearing in mind that there could be many versions of text used. It is advised that site staff are recommended to use some standard text, e.g. COVID-19, at the beginning of any deviation description text. However, one company that utilised this noted 14 ways of this being written, including COVID – 19, or *COVID-19, so this must be explicitly followed. It is also recommended that any MINOR deviations that refer to COVID-19 should be reported as MAJOR deviations, particularly if the deviation is relating to missing visits or assessments related to primary efficacy or safety parameters. It may be important to document the reasons for missing visits for future regulatory review. If the method of data capture does not support this, then alternative methods of collecting this data, such as protocol deviations, should be considered.


Critical Variables

Teams need to check that critical variables are complete and that they are SDVd and SDRd to a level that the study team feel the endpoint will not be negatively affected with regards to the study data interpretation.

Critical variables of the utmost importance should be notified to the site. If the risk impact to these variables is high, then it may be necessary to consider delaying any data cuts or database locks and this in turn may lead to a review/amendment of the protocol. Further, any risk assessment tools would need to be enhanced to include a new risk and mitigating method to account for COVID-19.

It is recommended to put Remote Monitoring Guidance into place, where law permits (remote data review (RDR conducted by various functional departments) and remote site contact, e.g. telephone calls, email conducted by CRAs. Rather than updating functional plans (e.g. Trial Monitoring Plans, Statistical Analysis Plans), it could be advised to create functional plan addendums. This would then mean that once the pandemic is over, the original functional plan can be returned to use, and the addendum filed in the trial master file.

It is proving to be very useful to create shiny apps and/or Spotfire reports to help study teams in their review of the amount and percent of missing visits, missing events or critical assessments. Innovative visualisations highlight the impact and missingness over time and enable the reviewer in real time to monitor the extent of the impact. Further, we can investigate the incidence (rate) of adverse event or protocol deviation reporting and the association with the COVID status in each country by utilising the data from the WHO website. This enables insights into how the event reporting has been affected by the lockdown in individual countries and how quickly the recovery of reporting can be seen.

While the protocol deviations have been discussed in a separate blog, information collected via protocol deviations can be utilised to identify missing planned assessments (primary & secondary efficacy and safety endpoints as examples). A possible way to aid in the quantitative assessment (including visualisation) of missingness is by creating an analysis dataset that contains, per participant and per assessment or assessments of interest, all expected visits/visit dates (actual and missed) through a cut-off point (a predefined point of censoring, i.e. beyond which no assessments are expected. For example, in an oncology study this could be the date of progression. For other TAs, this could be the date of discontinuation for a participant that discontinued study participation). The dataset would have flags or indicator variables to associate missing observation with a reason (e.g. participant infected with AE, temporary measures including site closure). The association of missing data with the reason may need to be derived (based on a well-documented set of data handling rules) unless collected explicitly in the eCRF. Such derivation may entail the notion of estimating missing assessment date, creation of visit windows and association with AE onset dates, protocol deviation start/end dates, etc. It would also be important to understand how such a dataset may be used not only for ongoing assessment of missingness/data monitoring activities but also towards reporting (CSR, submission). 

Below is an example of visualisations on missing endpoints for a particular study using R-shiny.

Another interesting aspect is to review the lag time between start of event and entry of event into EDC prior to the COVID situation and then during/after the lockdown periods. The results of this can promote discussion with sites to determine individual site status and whether or when the sites move back to a period of normality with regard to event/assessment reporting.

Will the Amount of Missing Data Affect Power?

If the amount of missing data is thought to affect power, there are several mitigating actions that can be employed. One possible approach is to recruit more patients; however, there may then be a risk of overpowering if the amount of missing data is less than expected. Another option is to amend the protocol to have a longer period of follow-up and/or delay data cuts or database locks. All these options would need to result in a review of the protocol and the need for an amendment. The reporting of event-driven studies may also be affected and require mitigating actions. Further analyses may be required to be added to the statistical analysis plan, either as sensitivity analyses or additional sub-group analyses, which is further discussed in the section below.

DMCs and Interim Analysis to Identify Impact on Trial Validity

To identify the impact of the pandemic on the analysis, it is recommended that an analysis of the trial data is conducted by an independent Data Monitoring Committee (DMC), which may already exist for the trial. If not, an independent DMC should preferably be established, following the necessary procedures regarding ethics committees and relevant competent authorities.

  • The statistical analysis section of the protocol and/or the statistical analysis plan may need updating based on the findings.
  • As a general principle, there are strong scientific reasons to conduct trials as planned and implement changes only when there is a convincing scientific reason that it improves interpretability of results.
  • There may be situations, however, where an unplanned (or early) analysis should be considered to minimise the effect of COVID-19 on the interpretation of the data at the risk of having lower power than originally planned. These include situations where:
    • The trial is close to completion.
    • A planned interim analysis is due soon.
    • The trial needs long-term follow-up to observe the primary outcome, especially in cases where enrolment of the trials will be slowing down or even paused during the pandemic.


Analysis and Reporting of Missing Data at the Conclusion of a Trial

The FDA and EMA issued guidelines on how the COVID-19 pandemic may affect the conduct of clinical trials. Based on these guidances, here are some principles to consider during analysis and reporting of missing data due to the pandemic.

1. Impact on Treatment Effect Evaluation

In order to identify and address concerns about the impact of the pandemic on treatment effect evaluation, sufficient amount of information on pandemic-related measures and whether trial participants or trial conduct were affected, as well as on the subpopulations of exposed/non-exposed, and infected/non-infected participants will be necessary. Some recommendations are:

Consider whether trial objectives are affected by the pandemic and whether trial estimands need to be modified (e.g. using alternative endpoints)

Consider design and analysis strategies and ways to handle potentially any altered endpoints, higher measurement variability and missing visits.

2. Impact on Validity of the Trials

The external validity of trial outcomes may be affected by the presence of different trial populations: some participants were present in the trial before the start of the pandemic; some during the pandemic while possibly exposed to associated measures; and some after the end of the pandemic.

Consider an analysis of the accumulating trial data in order to evaluate the implications on recruitment, loss of participants during the trial, the ability to record data and to interpret the treatment effect in light of the pre-, during and post-pandemic measures phases.


3. Missing Visits/Endpoints

As noted earlier in the blog, changes in study visit schedules, missed visits, or participant discontinuations may lead to missing information (e.g. for protocol-specified procedures). Regulatory agencies expect this information to be summarised in the clinical study report by sponsors (FDA, 2020).

A few recommended analyses are included below for consideration. Readers are encouraged to utilise these suggestions as general guiding principles to formulate relevant analyses and displays that best fit their situation.

Any pre-planned analyses that are modified because of the COVID-19 pandemic should be explained, and clarity on assumptions, characterising the type and amount of missing data for each estimator, should be documented. Also, a broad range of sensitivity analyses and methods of estimation of missing data should be considered.

Supplementary analyses using historical clinical trial data or real-world data to support assessment of the impact of missing data elements might be helpful considerations. Summaries of missed dose impacting dosing compliance, and disposition tables showing impact of COVID-19, as well as displays summarising the amount of missing data for primary (and possibly key secondary) efficacy analyses, should be considered. In addition, displays explaining the impact of lab data used as a trial endpoint which cannot be measured in the central lab and has to be done locally will be good considerations. Additional displays exploring further sub-groups (e.g. infected vs non-infected, site vs remote, age categories) should be considered if necessary.     


4. How Much Missing Data is Too Much?

The COVID-19 pandemic has created an exceptional situation where study teams may face a large amount of missing data, sometimes critical to study validity and patient safety. The specific reasons and situations were nicely described in previous paragraphs. Strategies proposed by sponsors and sites to address the situation and minimise impact have been many. However, even with best intentions, ensuring 100% data completeness and the necessity to ensure patient, site staff and sponsor staff safety, as well as adherence to country-epidemic regulations, turned out to be difficult or impossible under lockdown.

Regulators supported industry by issuing guidance, and several approaches have been described above on collection, adjusting visit frequency, etc., up to proper reporting in CSR. Over time we will see what the final impact has been on study validity.

So how much data missingness is too much?

There is no single answer to that. The answer depends on many factors: tested disease, specific study design, therapy and many others. For example, missed disease evaluation visits may impact oncology study reportable results much more than a vaccine trial. Or missing safety data of a Phase II study testing therapy with a narrow therapeutic window may be more impactful on patient safety oversight compared to a well-established therapy observational study. Each study will need individual evaluation by study teams, and statisticians overseeing statistical plans will have a prominent role in interpretation. The support and collaboration from a clinical data scientist or programmer to operationalise the statistical plans will be critical in this regard.

In addition to methods described in detail in previous chapters, one of the possible techniques to verify missing data impact could be the use of a Quality Tolerance Limits (QTLs)-like approach.

In a nutshell, the QTL concept, defined by ICH GCP E6 R2, has been established to ensure implementation of proactive measures to secure data quality/validity and patient safety at study level. 

A QTL is a level, point or value associated with a parameter which is critical to study quality. QTLs are identified at study level and, if exceeded, may indicate systematic issue that can impact subjects’ safety or reliability of trial results and as such should trigger an evaluation.

Typically, QTLs are defined at the beginning of the study, are driven by Critical to Quality factors and followed over the course of the study. If there is a risk of crossing the threshold, corrective or preventive action takes place.

When defining QTLs, teams focus on what truly matters for study validity. TransCelerate recommends selecting 3–5 parameters (TransCelerate, 2017). The threshold setting is driven by statisticians and clinical. The teams use similar studies’ historical data and current study statistical plans. If a QTL targets a parameter related to missing data, the teams may immediately assess its impact.

It may be that a sponsor does not currently use QTLs addressing missing data or does not yet have a QTL process in place. Still, a similar approach could be used to set a snapshot of a study and continue oversight of the study moving forward. One-time parameter and threshold identification could be defined using the same methodology and observed moving forward.

Although this approach does not follow the intention of ICH GCP to set QTLs at the beginning of the study, it seems that it can be very practical under the current COVID circumstances.

One important thing to remember is that use of QTLs requires formal reporting in CSR. This should not be a problem as sponsors are already required to report on COVID-related issues in CSRs.

We recommend TransCelerate guidance on QTLs for those who would like to consider this approach (see References).

In summary, the COVID-19 pandemic will impact the conduct, analysis and reporting of clinical trials. It is essential for clinical data scientists to understand these impacts and take an active role to help mitigate, monitor and report the amount of missingness.



U.S. Food and Drug Administration (FDA) (2020), Guidance on Conduct of Clinical Trials of Medical Products During COVID-19 Public Health Emergency, Guidance for Industry, Investigators, and Institutional Review Boards,

European Medicines Agency Committee for Medicinal Products for Human Use

(EMA/CHMP) (2020), Guidance on the Management of Clinical Trials During the

COVID-19 (Coronavirus) Pandemic,

O’Kelly, M. and Ratitch, B. (2014), Clinical Trials with Missing Data: A Guide for Practitioners, New York: John Wiley & Sons.

TransCelerate Biopharma Inc. (2017), Risk-Based Quality Management: Quality Tolerance Limits and Risk Reporting,  

Posted by on

Tags: COVID-19

Categories: Working Groups News

Related Blogs

Add Your Comments

Thinking of joining PHUSE?

Already a member but not sure how you can benefit?

PHUSE is an expanding, global society with a global membership of clinical data scientists. It requires a large pool of resources to help with its running, and so there are many opportunities for members to become involved. Whether it's chairing a conference, presenting at an event, leading a working group or contributing to the quarterly online newsletter, we are always keen to hear from volunteers.

Find Out More