Home Business Insights & Advice Three steps to write an annotated CRF SDTM through automation

Three steps to write an annotated CRF SDTM through automation

by Sponsored Content
8th Apr 22 11:12 am

The miracle of coming up with cures for seemingly incurable diseases and infections results from hard work in clinical trials. It’s in clinical trials where proposed medications undergo tests to see whether they’re safe to be used by humans. It involves a series of testing and data gathering before clinical conclusions reach health authorities for review and approval.

The data gathering process used case reporting forms which collected the source dataset. Then, the source dataset is compiled in a tabulated dataset, which is what clinical experts submit for review. In the past, pharmaceutical companies would manually map their source dataset to write their annotated CRF sdtm. But now, there are new technological breakthroughs that enable research centers to automate the annotation process. It allows for a more seamless process in clinical trials.

To further discuss, here are some of the steps you would need to execute to be able to write an annotated CRF SDTM through automation.

1. Design your clinical trial CRF

The first thing that you have to do before you can automate the writing of your annotated CRF SDTM is to come up with a well-designed case reporting form (CRF). A CRF is a document used in clinical trials to obtain data from each patient. Its objective is to capture the required patient information in a standardised study format – useful for quantitative analysis.

To do this, you should follow the basic principles, considerations, and best practices in CRF design, formatting, and completion. Here are some of the things that the CRF design team should follow:

  • Observe CRF Design Principles – The CRF design team should, as much as possible, follow the generally-accepted principles of CRF design. Doing so will allow them to create an information-highlighting case reporting form while adhering to the basics of design principles.
  • CRF Layout And Format Considerations – The CRF design team should consider best practices and practical considerations for the CRF format and layout. There are well-designed and poorly-designed CRFs. Well-designed CRFs help improves accuracy and efficiency in data collection.
  • Guidelines For CRF Completion – The CRF design team should provide guidelines on how to fill out and complete CRFs. The CRF completion guidelines should contain clear questions, instructions, and prompts. It helps minimise errors and inaccuracies in CRF responses and recording procedures.

2. Use source datasets to determine SDTM domains

The next step in automating the set-up and writing of your annotated CRF SDTM would be to prepare your SDTM domains and variables. The Study Data Tabulation Model (SDTM) is a framework or model used to organise the way data is collected during clinical trials. It was developed by the Clinical Data Interchange Standards Consortium (CDISC).

SDTM is based on the data collected from the patients who took part in a clinical trial. These collected data are also sometimes called observations. The SDTM is a table format of the dataset for review by the Food and Drugs Administration (FDA).

The dataset from the clinical trials has to be sorted out and organised into a table of data for analysis and interpretation. The dataset from the clinical trials is called the source dataset. The dataset, sorted in a table form, is called the SDTM dataset.

The source and SDTM datasets are submitted to the FDA. It goes with the analysis, interpretation, and conclusions of the clinical trial studies. It is vital so that the FDA would know where the data came from and how it was handled from the clinical trial data collection stage to the data tabulation, processing, and final submission.

The clinical trial research team should configure the SDTM domains and variables, which will be mapped with the source dataset, to create their CRF annotation. The observations recorded in CRFs are classifiable into three general observation classes or data classes:

  • Interventions
  • Events
  • Findings

Each of these general observation classes has data subsets which are, in turn, called associated domains. An associated domain is a group of observations that can be organised around a common topic. Some examples are vital signs and medical history.

In other words, a domain is simply a group of patient data formed into a group because it makes sense for them to be that way. Another associated domain is called adverse events. Some observations included in this class are headache, nausea, and dizziness. These are called adverse events because their unfavorable effects were caused by using a medicine or medical device subjected to clinical trial.

3. Mapping SDTM dataset and generating annotated CRF

The final step in writing your annotated CRF SDTM through automation is to map your source dataset from the clinical trials to the SDTM dataset. The term annotation refers to the mapping of the source dataset from your case reporting forms to your SDTM dataset. In the past, this was done manually. But with technological advancements, this can now be done using SDTM automation software.

The CDISC developed two important models which have become the basis of SDTM standards.

  • ‘Core’ Study Data Tabulation Model (SDTM)
  • SDTM Implementation Gudie (SDTM-IG)

The core model provides a standard set of variables. These standardised variables are grouped into different classes according to their roles. These standardised variable classes are then, in turn, grouped into variable collections that are used for specific cases called SDTM-IG domains. The SDTM core model can also be used for clinical trials performed on non-humans, which use the SEND-IG domains for particular use cases.

The SDTM framework assigns a two-character domain code as a prefix to domains. For instance, the Vital Signs domain has the domain code VS. This would be prefixed to the variable related to the domain. If the variable used for tests, for instance, is TESTCD, then the mapped domain code would become VSTESTCD.

Each domain in the SDTM framework contains a dataset. This dataset is a collection of related data taken from the source dataset. SDTM datasets have a set of named variables. The names have been standardised for ease of recording, retrieval, use, collaboration, and review.

When all the variables in the source dataset are mapped to their corresponding variables in the SDTM framework, the automation software can finalise the mapping and generate the annotated CRF with the tabulated datasets. Here are the things you need to do to implement the SDTM mapping:

  • Compare the SDTM dataset to the SDTM metadata and directly map whenever possible.
  • Map the source dataset to the SDTM domains.
  • Map the source dataset variables to the SDTM domain-designated variables.
  • Convert the data from the source dataset that needs data processing and conversion using SDTM mapping tools.
  • The SDTM dataset would then have to be validated.
  • Generate the Define.xml file and then validate.


Writing annotated CRF SDTM can now be done using SDTM automation software. The CRFs collect the source dataset from patients in clinical trials. The SDTM domains and variables are then created and configured. SDTM mapping tools can map the source dataset to the SDTM dataset. The source dataset, annotated CRFs, and tabulated SDTM datasets are then included in the clinical trial results submitted to the FDA for review.

Leave a Comment


Sign up to our daily news alerts

[ms-form id=1]