Sample size calculation is important, not only because of industry regulations, but also because of ethical considerations. In this blog, I am going to highlight the importance of sample size calculation and give you 8 key considerations before you calculate sample size and ask for a biostatistician’s help.
It is important to reiterate that we recommend avoiding generic sample size calculators for clinical trials, as the many variables between different medical device studies vastly increase calculation uncertainty. In addition, medical device sample size justification may be negatively impacted by utilizing simplified tools for sample size calculations.
Requirements for justification of sample size are mentioned in the EU medical device regulations (EU MDR Annex XV ch. III). When planning for clinical data collection activities, you must provide justification for how you arrive at the sample size.
ISO 14155:2020 also mentions the needs for sample size calculation. You need to provide justification for your sample size in the clinical investigational plan. Sample size is also mentioned in the MDCG guidelines for PMCF plans along with multiple FDA guidelines and the ICH GCP.
It is interesting to note that in all these documents, the topic of sample size shows up as a short sentence in between many other bullet points/requirements. The detailed requirements of how the sample size is calculated or justified is often left out.
This vagueness is important to clarify because, having assisted hundreds of companies initiate clinical data collection, we have seen this topic being largely overlooked. There is a common tendency to put minimal focus on the topic of sample size in clinical investigational plans, PMCF plans, or study protocols.
The evaluation of clinical performance of a medical device should be based on scientific rationale, not randomness. This highlights the importance of using the appropriate sample size in your activity. If the sample size is off, insufficient, or insufficiently justified you will risk not being able to scientifically justify your performance or safety accordingly.
‘’Using the power of 0,80 and α of 0,05 it was determined that a sample size of 104 patients is sufficient to prove the null hypothesis.’’
The example above is a common way of describing sample size calculation, however, it is not sufficient. But why do people tend to focus less on this? One reason could be that there is a scarcity of sample size calculation knowledge from clinical experts.
Furthermore, sample size calculation has a lot to do with mathematics and people tend to rely on biostatisticians or mathematicians to do it for them.
However, sample size calculation for clinical studies relies heavily on clinical input and expertise which, in our experience, not many are aware of.
So, the example above lacks proper justification. If you have written a clinical protocol which includes a sentence like that, I would suggest you keep reading!
That is because you are not addressing the purpose of your sample size. You are not clarifying the hypothesis you’re basing your research on and you are not identifying the clinical assumptions that you’ve made to arrive at the sample size you have calculated.
So, the justification of the sample size calculation for a clinical study or PMCF activity needs to include these elements. It’s essential to not only talk about the statistical assumptions but also clinical assumptions that relate to sample size calculation. Read our guide to generic and specific PMCF activities to help you select the right activity for your device and comply with EU MDR.
It’s important to note that sample size must be calculated for each individual clinical study/PMCF activity. This is especially important, because people tend to think that when the guidelines or the EU MDR mention PMCF and sample size together they understand it as they need to calculate sample size as a whole spectrum out there in the market. Read our complete guide to PMCF compliance under EU MDR.
If you’re doing a survey study, you need to determine the sample size for that study as well. You do not determine the sample size across the entire portfolio of potential applications of your medical device. Sample size calculation is activity specific.
The 8 aspects below are simple things that you can consider and discuss internally before you include a biostatistician in the sample size calculation.
Ask yourself this: ‘’What is the purpose of this PMCF activity? What are we trying to accomplish here?’’
Clinical teams tend to decide on a study activity, without considering whether that fits with the clinical strategy. Is a survey convincing enough for the Notified Body? Are you exploring the area of interest, or confirming something?
The purpose and objective of a study are two different things. Your purpose could be to learn or to prove something. Whether it’s learning or proving something, there are different methods that can be used to determine your sample size.
The objective is equally important, and determines the statistical testing you want to do on the data. Is the objective to prove something? What? That your device is superior, equally as good, or non inferior? The study objective highly correlates with the type of mathematical equations that are actually used to calculate sample size. So, the simplicity of defining your study purpose and objective will help as to which equations to use.
This is usually one of the first things that clinical experts define when designing a clinical investigation. What are the primary endpoints? What are the main clinical results for which we are gathering clinical data to show, illustrate or prove a clinical hypothesis.
If you have a lot of endpoints, you will have a hard time determining your sample size. You need to be focused on establishing one or two, at most, primary endpoints because these will be used to calculate your sample size.
These endpoints can be different variable types that can impact your sample size calculation, whether it’s a discrete variable (yes, no), continuous (heart rate, blood pressure), etc. All of these types impact the calculation.
The decision to define single or multiple primary endpoints depends on the type of device risk, or activity. But agreeing on which primary endpoints to focus on, and why, is very important, because you don’t want to have that discussion when you’re already paying for someone to calculate your sample size. This discussion can easily turn into a debate about which endpoint is best to focus on, and it is not something you should be doing in front of a biostatistician.
Study design is another straightforward element that clinical teams debate extensively during the planning of a study. A few examples of study designs are: randomized, adaptive, non-randomized, cohort, case-control, cross-sectional.
Which study design you utilize will influence your sample size. All 8 aspects mentioned in this blog can affect the sample size.
That is because the different models and mathematical equations that are used to calculate sample size are designed differently depending on how the data has been collected and how you’ll compare the data between different populations.
If you’re doing a randomized control study, or a self-control study, those aspects of you collecting and comparing the data will impact the sample size.
The statistical hypothesis is often misunderstood. What you hypothesize will happen in your study is not necessarily the same as the statistical hypothesis that you define.
Your hypothesis might be that your device is equivalent to the current gold standard. However, the statistical hypothesis needs to be more quantitative and measurable. For example, you might need to demonstrate that there is no difference in mean arterial pressure (MAP) between your device and the gold standard.
Now we’re getting somewhere! However, we still need to express this in statistical or mathematical terms: which mathematical/statistical analysis will we utilize and how/why will we define the variables of the analysis to appropriately represent this arterial pressure comparison?
The null hypothesis and alternative hypothesis that you determine for your study must be in line with your study purpose and objective.
Let’s take another example of device A and device B, where it’s believed that the mean arterial pressure (MAP) will be different among the groups that would use device A and device B respectively.
In this case, you have the null hypothesis which states ‘’there’s no difference in MAP between the two groups” and the alternative hypothesis that there’s a difference in MAP between the two groups.
You don’t need to worry about the mathematical stuff. Your task is to transfer or translate what will happen in the study (general hypothesis) into statistical terms. If you have a good idea of what you’re going to measure, and for what reason (purpose and objective), then it’s fairly simple for a mathematician or biostatistician to determine the null and alternative hypothesis.
Also called ‘’alpha’’, or in simpler terms ‘’the probability of rejecting the null hypothesis”, the significance level is one of the most prominent variables used in the justification of sample size.
The probability of rejecting the null hypothesis of a medical device study is recommended to be at least 0.05 (5%). This is a general recommendation by the industry and organizations like the FDA and ICH in guidelines on statistical principles for clinical studies.
Having said that, the probability can also be less. What I mean by that is, you want to be even more confident that you will be able to reject the null hypothesis.
Another important parameter is Power, which is very often determined to be 80%. Power is a mathematical equation ‘’1-β”, where beta (β) is 0.20 (20%).
As per guidelines from FDA, ICH on statistical principles for medical device studies, you should strive to use a minimum 80% Power, but preferably higher. An increase in the Significance Level, Power, will result in a larger sample size. With a higher Power, you are trying to ensure that the probability of your study identifying an effect is higher.
Power is a parameter that you can tweak, and you can discuss that with a biostatistician. If you increase Power, you can make sure that your device shows more difference and better outcomes of a clinical measurement.
If you want insights on how study design can impact sample size, as well as how to better prepare for a discussion on sample size with a statistician, watch this free webinar.
In the image above, a “Type II” error means that your findings have shown that your device does not actually reduce MAP. The hypothesis was that the device helps to reduce mean arterial pressure, whereas reality illustrates it’s not true. That’s considered a “Type II error (β)”.
This table is helpful because it makes it easier to see how these two parameters might affect the study results and findings versus what is really happening.
This is something that clinical teams tend to forget to include in their sample size justification. Going back to the MAP example, if your goal were to investigate whether there is a difference between device A and device B you want to determine the minimum clinically meaningful effect.
The minimum clinically meaningful effect is the measured difference for a given endpoint between two references (e.g., between a medical device and the golden standard).
To put it more simply, is there a clinician that can tell you that if you lower the MAP from 100 to 90 it has a minimum clinically meaningful effect on a patient? If so, it is a measurement you can add to your sample size statistical calculations.
You need to determine whether there is a minimum clinically meaningful effect in using your medical device in the first place, which then can be used to calculate your sample size. Let’s look at a more complex situation. Let’s say you have a novel technology where the reduction of a certain parameter is only minimal.
The clinical expert says that small change (e.g., from 0.1 to 0.2) would produce no clinically meaningful effect. In this case it will be a little problematic for you to conduct your study with a sample size that makes sense.
In the case of a bigger effect, for example from 0.1 to 0.5 which is where the clinician believes it can start to be more clinically meaningful, your sample size can be hugely impacted. This might result in you having to conduct a study with 5000 subjects, for example, instead of only 10.
And that is why the clinically meaningful effect is so important to take into account before calculating your sample size. For some, it might even be the most important factor impacting sample size.
This element can be difficult to include in your sample size calculation, because usually it needs to be built upon prior knowledge or assumptions made in your field of expertise.
Before speaking to a biostatistician, you should look at early-stage data that you might have. Otherwise try to identify publications, articles, or research papers that have documented something similar within your area of expertise. Look up studies with similar parameters or measurements where variability or standard deviation is mentioned.
Doing this prior research will help you in your own efforts to calculate sample size.
If you for example find a study where they concluded there’s a standard deviation of “X’ between an individual of 18 years old and 50 or 80 years old, it’s very useful information to share with a biostatistician or mathematician.
So, you can estimate the standard deviation or variability based on clinical assumptions/results, use input from prior pilot studies or in-human studies, and literature.
Remember that sample size justification requires clarification of both clinical and statistical assumptions. It’s not enough to only clarify Alpha or Power to determine sample size of ‘’X’’.
You need to make sure that you can also document and illustrate to reviewers that the assumptions you made about the clinical benefits and meaningful effect, for example, support your sample size justification. In other words, prove your sample size calculation is feasible for your particular purpose.
Greenlight Guru Clinical is the only electronic data capture (EDC) software designed for Medical Devices and Diagnostics.
Greenlight Guru Clinical provides capabilities for gathering data in clinical studies, performance studies, PMCF/PMPF studies/surveys, registries, cohorts, case series, as well as from connected devices and wearables.
Get in touch to learn how you can design highly customized eCRFs, or collect patient reported outcomes (ePRO) for PMCF under EU MDR.
Jón Ingi Bergsteinsson, M.Sc. in Biomedical Engineering, is the co-founder of Greenlight Guru Clinical (formerly SMART-TRIAL). He was also the technical founder of Greenlight Guru Clinical where he paved the way for the platform’s quality standards, data security, and compliance.