6.2 Registry research design
Early in the registry development phase it is necessary to determine some kind of an overall research plan/design that defines the registry characteristics and (future) operation from a more research-methodological point of view. Various elements have to be considered at this stage of registry development.
- Research questions or hypotheses should be properly formulated.
- It is essential to clearly define the target population of the registry. Registry should be also defined in terms of geographical and organizational coverage.
- Definition of cases that are going to be included in a registry should exist. Inclusion and exclusion criteria have to be determined and clearly stated.
- It is necessary to understand which study model can be applied in a registry (e.g. cohort, case-control, nested case-control etc.).
- Anticipated registry size and duration should be estimated.
- Registry data collection procedure has to be determined. It must support the highest possible data quality, lowest possible burden for the reporting units and lowest possible costs for the registry.
- In case of follow-up a clear strategy should exist.
- Thorough documentation for the entire data collection protocol, including guides for data providers and other users should be prepared.
- Representativeness and generalizability of a registry should be considered and appropriately described for data interpretation purposes.
- During registry design phase time, costs and registry resources need to be constantly taken into account.
When the purpose and main objectives of the registry are defined, the next step is to define the data to be collected, and determine the methodology/protocol with which the registry will try to achieve the defined goals. At this point, the registry holder needs to consider many issues, including the defining of the registry target population, anticipated registry size and duration, study design, data sources for the registry, registry dataset and data collection methods/procedure. At the same time, the registry holder needs to look at the registry resources, costs and consider the quality aspect. This chapter describes those registry’s elements and covers the important aspects that are necessary to take into account during that development stage.
- 1 6.2.1 The population covered by a registry
- 2 6.2.2 Anticipated size and duration
- 3 6.2.3 Registry dataset
- 4 6.2.4 Data collection procedure
- 5 6.2.5 Research-based registries - additional points to consider
- 6 Notes
- 7 References
6.2.1 The population covered by a registry
Enrolment of the patients for a registry starts with a clear understanding of the target population, which is a population to which the registry would like to generalize its results and findings (e.g. patients with multiple sclerosis in Slovenia). When building a registry it is important to accurately define the target population since it is a key factor in forming the registry population. The registry holder should understand and determine whether the registry is a hospital-based registry[note 1], population-based registry or even a population registry.[note 2] It is necessary to define the registry in terms of geographical and organisational coverage.
In addition to target population, it is recommended that a registry provides a case definition which is a detailed specification of the patients/cases that are going to be included in a registry. The registry team should specify so-called eligibility or inclusion criteria that are a set of conditions that a patient must meet to be eligible for inclusion in a registry, and generally include geographic (e.g. hospitals in a particular region of the country), demographic (e.g. age, gender, ethnicity), disease-specific (e.g. a certain diagnosis, stage of disease), time-specific (e.g. specification of the included dates of hospital admission), laboratory-specific, and other criteria (e.g. size of the hospital in terms of number of patients) . Exclusion criteria, on the opposite side, are those criteria that disqualify subjects from inclusion in the registry. Inclusion and exclusion criteria often reflect considerations such as cost and practical constraints (sometimes subjects are not included, not because they are out of interest, but due to the additional cost or burden of including them), ethical concerns, people’s ability to participate (e.g. their health condition may prevent participation), and design consideration (it is sometimes advantageous to have a more homogeneous population as a means for reducing confounding, but in terms of generalizability, stringent inclusion criteria might reduce the generalizability of the registry findings to the target population) . Inclusion and exclusion criteria should therefore be defined carefully and many aspects need to be taken into account while defining those criteria, as the selection of inclusion/exclusion criteria can optimize the internal validity or generalizability of the registry, improve its feasibility (also in terms of follow-up and attrition), and lower its costs . Besides very clear definitions of the inclusion and exclusion criteria it is crucial that criteria are well documented, including the rationale for these criteria.
6.2.2 Anticipated size and duration
Estimation of anticipated registry size is an important part of the planning process. Some registries try to include all cases from the defined population, but often registries include only a sample of a population. In that case it is recommended to prematurely estimate how many cases the registry is planning to include. If the registry is too small, it may have insufficient analytical power, and it may not ensure adequate exploration of the objectives. On the other hand, a registry that is too large may waste time, resources and money. Hence, it is important to adequately plan the registry’s size. Various components impact on estimating registry size and need to be considered, including :
- the study outcome and its frequency/variability
- size of clinically important effects, the desired precision of estimates (e.g. the width of a confidence interval);
- timeframe (e.g. for analyses, dissemination of results);
- available resources and money, feasibility;
- support for regulatory decision-making (e.g. if registry is intended to support regulatory decision-making, the precision of the estimate is important);
- anticipated drop-out rate
Many methods for sample size calculation exist and are described in general statistics textbooks . There are also different tools that can assist in sample size calculation. Besides software programs (e.g. G*Power, nQuery Advisor, PASS, STATA) there are also online tools that allow free sample size calculations, such as:
- Russ Lenth's Power and Sample Size
- David Schoenfeld's Statistical Considerations for Clinical Trials and Scientific Experiments
- UCLA Calculator Service
- The Survey System's sample size calculator
- Raosoft's sample size calculator
These tools should be used with caution, since they are not always reliable or suitable for any situation.
Although a patient registry is generally considered as a long-term and sustainable action, the anticipated duration of a registry (taking into account the enrolment and follow-up phase) should also be specified when developing a registry. The duration of a registry depends on what type of registry it is, what the specific procedures in the registry are, and what objectives need to be met. Some registries collect data at only one time point while others collect data for the lifetime of the patient. A registry may be open-ended or it may have a fixed end point when enough data to achieve the registry’s objectives is expected to have accrued . If we neglect the funding as the biggest factor for registry duration and sustainability, the factors that the registry holder, together with the key stakeholders, should consider when estimating registry duration, include the induction period for desired outcomes, sufficient follow-up time for the exposure, data collection method; sample size, complexity of data being collected, anticipated accrual of enrolled subjects, and deadlines for dissemination of results .
It is worthwhile to note that registry size can also refer to the number of sites included in a registry, and to the volume and complexity of data being collected . Hence, a registry holder can consider these perspectives as well.
6.2.3 Registry dataset
The registry needs to develop the dataset that will serve the purpose and objectives of the registry. Although some key variables/data elements can be identified and determined soon in the developing process, this can be a very lengthy activity and should not be underestimated since it is the registry dataset which will eventually determine the usefulness and success of the registry. More information on developing a registry dataset is available in chapter 6.3).
6.2.4 Data collection procedure
The decision on how the registry will collect the data is affected by several factors, namely the characteristics of the registry’s target population, the information that needs to be obtained and other specific goals of data collection, available data sources, registry resources and time limits. The registry data collection procedure must support the highest possible data quality, lowest possible burden for the reporting units and lowest possible costs for the registry. The registry holder needs to identify and evaluate all available data sources and determine which one will be used. The registry must make an agreement with the data providers and develop the technical protocol [note 3] for the data acquisition. (More information on data sources is available in chapter 6.4 ‘Data sources for registries’.)
When developing a registry data collection procedure, the registry should take into account the technological aspect of data collection (e.g. paper-based forms, web-based data entry, use of personal computers, handheld computers, scanners, mobile phones) and be aware of advantages and disadvantages of both, paper-based and electronic approaches. The choice of which system to use depends on where the data are captured, by whom, and what resources are available for the particular reporting unit. It is important that the approach is practical and reliable. In addition, a registry designer needs to look also from the perspective that is especially well-covered in the field of survey methodology, where a great emphasis is placed on the modes of data collection, their characteristics and principles of good practice. This includes, for example, the consideration as to whether the case report forms or questionnaires are understandable and easy to use, questions or instructions are worded correctly, whether they are measuring the right things, whether the presence of the interviewer/data collector (e.g. nurse) would influence a patient’s answers; the self-administered mode would yield more honest answers or produce a lower response rate, telephone data collection could be used to obtain data more cost-effectively, this mode would enable response from all patients, etc.
Registry data collection can be transversal, where all defined patients are registered once, or longitudinal, where the data are collected at different time points for the same patient. In case of the longitudinal design, the registry should carefully determine (a) which data needs to be (re)collected, (b) at what time points (e.g. every 6 months), (c) how long (e.g. for 10 years) and (d) with what means (e.g. with the telephone, by visiting a general practitioner, by data linkage to other records). When developing the follow-up strategy it is important to consider the costs which can increase significantly when the follow-up is implemented via personal contact, the extra work that will be put on the data providers and the burden that will be imposed on the patient. The latter can quickly become an issue as the preparedness of the patient to provide data is easily exhausted. This may result in loss to follow-up which can lead to the biased results, especially if these losses are not random. For example, if in a follow-up process only data from satisfied patients with encouraging outcomes are obtained, meanwhile unsatisfied patients with less promising outcomes do not want to participate in a follow-up, then the registry does not reflect the true picture. The registry should therefore, develop a good patient retention plan that is suitable to the target population.
In all this, a registry should prepare thorough documentation for the entire data collection procedure and provide methodological guides/standard instructions and rules for data collectors/providers and other data users. This typically includes information on reporting dynamics, what data needs to be collected and how, means of data transmission, established controls for the acquired data (e.g. readability of data, adequacy of records and their number) and access rights. It is often advisable to describe also the typical data flow of the registry, where the information on how the data travels from the source to the registry, together with the other additional information (e.g. key persons/stuff included in the process, type of technology and data collection method used, access rights, data transmission, timetables) is clearly specified. The description of the data flow can help the registry team and other stakeholders (e.g. company that will provide the technical solution) to better understand the whole data collection protocol. Among the other things, it can serve also when performing evaluations of the data collection protocol (e.g. identification of potential sources of errors etc.)
6.2.5 Research-based registries - additional points to consider
Nowadays, many registries are being developed that are taking a more research approach. These study-oriented or research-based registries possess different characteristics, therefore some additional points need to be considered when developing this type of registry. However this does not mean that points described below should be entirely ignored by registry holders who aim to develop more ‘classical’, wide encompassing registries. All in all, also the latter can be seen in some way as research-based registries (i.e. there are always some research questions that that registry tries to answer).
188.8.131.52 Research questions and hypotheses
When the purpose and main objectives of the registry are clearly defined the next step is to take that purpose or idea and shape it into a researchable question. Research questions and hypotheses narrow the purpose of the study and become major ‘signposts’ for guiding the overall study .
Research questions for registries range from purely descriptive questions aimed at understanding the characteristics of people who develop the disease and how the disease generally progresses, to highly focused questions intended to support decision-making . Research questions in registry-based studies are generally hypothesis generating (i.e. developing hypotheses after the data are collected and new knowledge is gained) or evidence building, rather than hypothesis testing. However, registries focused on determining clinical effectiveness, cost-effectiveness or risk assessment are commonly hypothesis driven . Regardless of the nature of research questions (or hypotheses) it is crucial for a registry planner to define them because all further decisions (e.g. registry population, what data will be collected and analysed) and work in a registry development process are guided by research questions of interest. Proper formulation of a research question or hypothesis is not an easy task and should not be underestimated. An improperly defined, unfocused or underdeveloped research question or hypothesis can generate a risk for not getting the right results and accomplished objectives of a registry. Accordingly, it is highly recommended that a registry developer invests/spends required time to suitably develop a research question or hypothesis.
(Research) ideas as a foundation for developing a research questions or hypotheses are typically gathered by literature review, critical appraisal of the published clinical information, brainstorming with colleagues, seeking experts’ opinions, and evaluating the expressed needs of the patients, health care providers . The clinical questions of interest can also be defined by payers/sponsors of the registry. Thus, it is not uncommon that multiple questions are set as a result of the interests of different stakeholders. In that case a registry planner should be aware that a higher number of research questions can increase the complexity of a registry study design and subsequent collection of data and statistical analysis. Registry developers should therefore assess whether it is feasible to answer every question of interest.
When defining research questions or more specific research hypotheses it is important that they are accurate, understandable and focused enough for a specific registry. The clinical epidemiology literature offers various instructions on research questions and hypotheses, such as, for example, FINER  and PICOT  criteria for a good research question. An example of a research question and hypothesis for a registry is presented in Table 6.4.
Monitoring clinical effectiveness of hip implants
Hypothesis: In Europe, exchangeable neck hip stem implants have significantly higher revision rate than hip implants with un-exchangeable neck.
Natural history of patients with diabetes disease
Research question: What is the incidence and prevalence rate for diabetes type 1 disease among children and adults in Slovenia?
184.108.40.206 Key exposures and outcomes
In a simplified way we can describe the exposure and outcome as a relationship, where one event (i.e. exposure) affects the other (i.e. outcome). In the field of patient registries, the term ‘exposure’ refers to treatments and procedures, health care services, diseases, and conditions, while outcomes generally represent measures of health, onset of illness or adverse events, quality of life measures, measures of health care utilization and costs .
It is crucial to identify the key exposures and outcomes at the very beginning of a registry development, since the selection of exposures and outcomes will affect further registry development (e.g. registry study design, data collection process). The identification of key exposure and outcome variables is guided by the registry research questions that are defined at the registry’s outset. When identifying the key exposures and outcomes it is important to know that sometimes more outcomes need to be selected (as a result of multiple questions of interest), and exposure often includes a collection of different information, such as dose, duration of exposure, route of exposure, and adherence . For example, if we select smoking cigarettes as an exposure for measuring a particular outcome (e.g. heart disease) probably it would not be enough to have only one binary variable for exposure (i.e. smoking or non-smoking), but to include also other information, such as dose (e.g. how many cigarettes per day) and duration (i.e. how many years of smoking). During the identification of key exposure variables it is therefore necessary to consider also this aspect, and it is useful to take into account independent risk factors for the outcomes, and confounding variables as well. More information on selecting data elements for a registry is provided in chapter 6.3.
220.127.116.11 Study design
Registry studies are observational studies in which the researcher merely observes and systematically collects information, and, unlike in the experimental studies, does not assign specific interventions to the study subjects being observed. In observational studies the researcher chooses what exposures to study, but does not influence them.
Although patient registries are generally considered as prospective observational studies, the registries, from the time perspective, could be both – prospective and retrospective studies. Prospective studies are designed to gather data about events that have not happened yet, while retrospective studies are designed to gather data about events that have already happened. Thus, prospective studies look forward in time and retrospective studies look backward .
It is not always simple to define which study design[note 4] the registry follows, using traditional epidemiological terms. For example, in some situations study design for a registry can be considered as an opened cohort or simply a case series of patients under some specific diagnosis . Sometimes even the registry's nature itself does not require clear specification of its study design. However, it is necessary for a registry designer to understand which study model can be applied in a registry. Several study designs that are more commonly applied in registries are cohort study, case-control study, nested case-control study, case-cohort study, and case series. Besides these, also some other designs are sometimes used, such as cross-sectional study and case-crossover design. Readers are encouraged to consult textbooks and articles of epidemiology for more information on study designs .
18.104.22.168 Comparison groups
A registry can also include and collect data on one or more comparison groups. Although registries usually do not use comparison groups, they are essential when it is important to distinguish between alternative decisions, to assess the magnitude of differences, or the strength of associations between groups. Based on the registry’s objectives three types of comparison groups can be used:
- internal comparison group (data are collected simultaneously for patients who are similar to the focus of interest, but who do not have the condition or exposure of interest),
- external comparison group (data have been collected outside the registry for patients who are similar to the focus of interest, but who do not have the condition or exposure of interest),
- historical comparison group (refers to patients who are similar to the focus of interest, but who do not have the condition or exposure of interest, and for whom information was collected in the past, for example, before the introduction of an exposure or treatment or development of a condition)
When deciding about including a comparison group in a registry, the registry developer should consider also that adding a comparison group may add complexity, time, and cost to a registry .
22.214.171.124 Sampling frame and sampling method
Registries sometimes try to include all units of the target population, but often they include just a sample of the target population from which inferences about the whole population can be made. The need for including only a sample of the target population typically arises because of limitations of time and resources but also due to other constraints . The activity of selecting cases (i.e. patients, institutions, object or events) into a sample from a larger collection of such cases, according to a specific procedure, is called sampling. Ideally the sample is drawn directly from the target population but usually this is not the case, because a sample can be drawn only from cases to which registry/participating sites have access (i.e. accessible population). Hence, the accessible population represents the sampling frame from which a sample is selected. Sometimes the accessible population is the same as the target population, but usually is a subset of the target population. In terms of a precision of registry’s estimates/results, a registry planner should be aware of this issue, since non-coverage of certain parts of a target population can lead to biased estimates . In other words, if cases of the target population who cannot be sampled (because there is no access to them) are different from those who can be drawn into a sample, the registry findings can be biased. During a sampling phase a registry planner needs to assess what impact on the registry findings a sampling frame and its potential non-coverage issue could have.
Many different sampling methods can be used when selecting cases for a registry. Sampling designs are classified as either probability sampling or nonprobability sampling. In general, probability sampling is the preferred method, in which the selection of individual cases (e.g. patients, events) is left to chance, rather than to the choice or judgement of the person. However, in some situations probability sampling is not feasible and nonprobability sampling is more useful. Some sampling methods that are often used for generating samples include simple random sampling; stratified random sampling; systematic sampling; cluster sampling; multistage sampling; case series or consecutive (quota) sampling; haphazard, convenience, volunteer, or judgmental sampling; modal instance; purposive; and expert sampling .
126.96.36.199 Representativeness and generalizability
When selecting patients, hospitals or events it is important that consideration about representativeness is made, since the representativeness is essential component of a registry study. If the sample is not properly representative, conclusions/generalization may be incorrect. The registry developer should consider representativeness in terms of patients (e.g. men and women, children, the elderly, racial and ethnic groups), sites (e.g. geographic location, practice size, academic or private practice type) and events (e.g. type of events/services on a particular day) . The registry developer should critically assess how the potential lack of representativeness can affect the results of a registry. For example, suppose that the purpose of the registry is to monitor the clinical effectiveness of specific surgeries. If a registry includes only academic centres/hospitals with high technical support, then the results probably would not reflect a true picture. On the other hand, for example, when a registry is not representative in terms of gender (e.g. a higher number of women in a registry), this would have no impact on the representativeness of the registry findings if the outcome that is observed (e.g. clinical effectiveness of a specific drug) does not vary with gender.
Associated with the representativeness, the generalizability concept is often used, which refers to the extent to which the conclusions of the registry study can be generalized/applied to populations other than those sampled and included in the registry. Strong generalizability or external validity is achieved by the inclusion of a typical patient sample which is often more heterogeneous (e.g. different demographic characteristic, comorbidity). Patient registries are generally designed to have strong external validity so that their population will be representative and relevant to decision makers. It is important to note that the way in which patients are included, classified and followed directly affects generalizability . In terms of data interpretability it is important to describe and document the representativeness and generalizability of a registry, and whether it covers the relevant patients, events and periods of interest.
- Registry that aims to record information on all patients seen in a given hospital or group of hospitals irrespective of geographical areas .
- Description of the population registry and population-based registry is provided in chapter 2.2: ‘Types of patient registries’.
- Protocol that includes a requirement for granting access, username and password creation, etc.
- A study design is a specific plan or protocol for conducting the study, which allows the investigator to translate the conceptual hypothesis and research question into an operational one .
- Gliklich RE, Dreyer NA, eds. Registries for evaluating patient outcomes: A User's Guide. 3rd ed.
- Gregg, M.B. Field Epidemiology. New York: Oxford University Press, 2002
- Polit DF, Beck CT. Nursing Research: Generating and Assessing Evidence for Nursing Practice. 8th Edition. Lippincott Williams and Wilkins, a Wolters Kluwer business.
- Eduardo Velasco. Inclusion Criteria. In Encyclopedia of Research Design, eds. Neil J. Salkind. 2010. SAGE Research methods. Available at: http://srmo.sagepub.com/view/encyc-of-research-design/n183.xml
- Noordzij M et al. Sample size calculations: basic principles and common pitfalls. Nephrol Dial Transplant (2010) 25: 1388–1393.
- Altman DG. Practical Statistics for Medical Research. London, UK: Chapman & Hall; 1991.
- Bland M. An Introduction to Medical Statistics. 3rd ed. Oxford, UK: Oxford University Press; 2000.
- Lwanga SK, Lemeshow S. Sample size determination in health studies - A Practical Manual. World Health Organization 1991.
- ISPOR: Taxonomy of patient registries: classification, characteristics and terms.
- Creswell John. Research Design: Qualitative, Quantitative and Mixed Method Approaches, 3rd Edition
- The Yeshiva Fatherhood Project. Introducing qualitative hypothesis-generating research.
- Lobiondo-Woow G, Haber J. Chapter 2. Nursing Research - Methods and Critical Appraisal for Evidence-Based Practice. 2013.
- Hulley S, Cummings S, Browner W, et al. Designing clinical research. 3rd ed. Philadelphia (PA): Lippincott Williams and Wilkins; 2007.
- Brian Haynes R. Forming research questions. Journal of Clinical Epidemiology 2006; 59:881-6.
- International Agency for Research on Cancer. Cancer Epidemiology – Principles and Methods.1999.
- Stark Nancy J. Registry Studies for Medical Devices – Whitepaper and Workshop Invitation. 2010.
- EPIRARE. Deliverable D4: Guidelines for data sources and quality for RD Registries in Europe. 2014.
- Verhamme K. Study designs in Paediatric Pharmacoepidemiology. European Journal of Clinical Pharmacology 67, S1 (2010) 67-74
- Song Jae W, Chung Kevin C. Observational Studies: Cohort and Case-Control Studies. Plast Reconstr Surg. 2010 December, 126(6): 2234–2242
- Rose S, van der Laan M. J. Why Match? Investigating Matched Case-Control Study Designs with Causal Effect Estimation. The International Journal of Biostatistics, Volume 5, Issue 1 2009 Article 1.
- Carlson D. A. M, Morrison S. R. A User’s Guide to Research in Palliative Care: Study Design, Precision, and Validity in Observational Studies. JOURNAL OF PALLIATIVE MEDICINE, Volume 12, Number 1, 2009
- Jepsen P., Johnsen S. P., Gillman W. M., Sorensen H. T. Interpretation of observational studies heart 2004; 90:956–960.
- Ernster, V. L. Nested case-control studies. Preventive medicine, 23, 587-590, (1994).
- Langholtz, B. Case-Control Study, Nested. Volume 1, 646-655. In Encyclopedia of Biostatistics, 2nd Edition. Eds. Armitage, P. and Colton T. John Wiley & Sons, Ltd, Chichester, 2005.
- Wachoider S, Silverman DT, McLaughlin JK, Mandel JS. Selection of Controls in Case-Control Studies. American Journal of Epidemiology Vol. 135, Mo. 9. 1992
- Kooistra B, Dijkman B, Einhorn TA, Bhandari M. How to design a good case series. The Journal of bone and joint Surgery. 2009, 91 Suppl 3:21-6.
- Dekkers OM, Egger M, Altman DG, Vandenbroucke JP. Distinguishing case series from cohort studies. Annals of Internal Medicine, 2012 Jan 3;156(1 Pt 1):37-40
- Andrews N. Epidemiological designs for vaccine safety assessment: methods and pitfalls. Biologicals. 2012 Sep;40(5):389-92
- Sim J, Wright C. Research in Health Care: Concepts, Designs and Methods. 2000. Stanley Thornes (Publishers) Ltd.
- Brick, Michael J., Ismael Flores-Cervantes, Kevin Wang in Tom Hankins. 1999. Evaluation of the Use of Data on Interruptions in Telephone Service. Proceedings of the Survey Research Methods Section of the American Statistical Association.
- Groves, Robert M. 1989. Survey Errors and Survey Costs. New York: Wiley.
- European Network of Cancer Registries. 2011. Guidelines on Confidentiality and ethics for population-based cancer registration and linked activities in Europe.
- WHO. Training manual for community-based initiatives: A practical tool for trainers and trainees. 2006.
- Laforet P et al. The French Pompe registry. Baseline characteristics of a cohort of 126 patients with adult Pompe disease. Revue Neurologique, Vol 169, Issues 8–9, pages 595–602. 2013.
- Cochran WG. Sampling Techniques. Third ed. Wiley; 1977.
- Carey TS, Sanders GD, Viswanathan M, et al. Methods Future Research Needs Reports, No. 8. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Mar.
- Statistični urad Republike Slovenije. Metodološki priporočniki: Smernice za zagotavljanje kakovosti, št 2. Ljubljana, 2012.