20  Appendix

20.1 Filter Data During Export

REDCapR and PyCap functions have options for filtering data upon export. If a REDCap project has a large amount of data that is slow to export, then we recommend using REDCapR/PyCap functions to filter the data during export when applicable.

Creating custom reports within REDCap and then exporting those filtered reports is another option for filtering the data before export for projects with a large amount of data. See Chapter 9.

20.1.1 Filter By Record ID

Export data for Record IDs 1 and 2.

%%capture
%%R
data_by_record <- redcap_read_oneshot(
    records = c(1,2), 
    redcap_uri = url, 
    token = token
)$data

project.export_records(records=['1','2'],
                       raw_or_label='label',
                       format_type='df').reset_index()
record_id redcap_event_name redcap_repeat_instrument redcap_repeat_instance first_name last_name phone_num zip_code dob age ... cc_phone cc_email close_contacts_complete supervisor_name supervisor_email work_inperson_yesno work_date work_contagious work_contagious_calc work_information_complete
0 1 Personal Info NaN NaN John Doe (999) 999-9999 98105.0 2006-04-11 18.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1 Notifications NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN Boss NaN No NaN No NaN Complete
2 1 Case Intake NaN 1.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1 Notifications Close Contacts 1.0 NaN NaN NaN NaN NaN NaN ... (999) 999-9999 fake_email@gmail.com Complete NaN NaN NaN NaN NaN NaN NaN
4 1 Notifications Close Contacts 2.0 NaN NaN NaN NaN NaN NaN ... (999) 999-9999 fake_email@gmail.com Complete NaN NaN NaN NaN NaN NaN NaN
5 2 Personal Info NaN NaN Jane Doe (999) 999-9999 98105.0 1994-06-29 29.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 2 Notifications NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN Boss fake_email@gmail.com Yes 2023-10-10 Yes NaN Complete
7 2 Case Intake NaN 1.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 2 Case Intake NaN 2.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 2 Notifications Close Contacts 1.0 NaN NaN NaN NaN NaN NaN ... (999) 999-9999 fake_email@gmail.com Complete NaN NaN NaN NaN NaN NaN NaN
10 2 Notifications Close Contacts 2.0 NaN NaN NaN NaN NaN NaN ... (999) 999-9999 NaN Incomplete NaN NaN NaN NaN NaN NaN NaN

11 rows × 48 columns

20.1.2 Filter By Date

Data exports can be filtered by the date the record was added or modified. In the following example, we will view all records that were modified or added after April 9, 2024. For more information on how these records were modified, refer to the logging section of this guide. See Chapter 17.

%%capture
%%R
data_by_date <- redcap_read_oneshot(
    redcap_uri = url, 
    token = token, 
    datetime_range_begin = as.POSIXct("2024/04/09", 
                                    format = "%Y/%m/%d")
)$data

Note: Need to specify date format as seen above using as.POSIXct

from datetime import datetime
project.export_records(date_begin=datetime.fromisoformat("2024-04-09"), 
                       format_type='df')
redcap_repeat_instrument redcap_repeat_instance first_name last_name phone_num zip_code dob age ethnicity race ... cc_phone cc_email close_contacts_complete supervisor_name supervisor_email work_inperson_yesno work_date work_contagious work_contagious_calc work_information_complete
record_id redcap_event_name
1 personal_info_arm_1 NaN NaN John Doe (999) 999-9999 98105.0 2006-04-11 18.0 1.0 4.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
notifications_arm_1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN Boss NaN 0.0 NaN 0.0 NaN 2.0
case_intake_arm_1 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
notifications_arm_1 close_contacts 1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... (999) 999-9999 fake_email@gmail.com 2.0 NaN NaN NaN NaN NaN NaN NaN
notifications_arm_1 close_contacts 2.0 NaN NaN NaN NaN NaN NaN NaN NaN ... (999) 999-9999 fake_email@gmail.com 2.0 NaN NaN NaN NaN NaN NaN NaN
2 personal_info_arm_1 NaN NaN Jane Doe (999) 999-9999 98105.0 1994-06-29 29.0 0.0 5.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
notifications_arm_1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN Boss fake_email@gmail.com 1.0 2023-10-10 1.0 NaN 2.0
case_intake_arm_1 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
case_intake_arm_1 NaN 2.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
notifications_arm_1 close_contacts 1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... (999) 999-9999 fake_email@gmail.com 2.0 NaN NaN NaN NaN NaN NaN NaN
notifications_arm_1 close_contacts 2.0 NaN NaN NaN NaN NaN NaN NaN NaN ... (999) 999-9999 NaN 0.0 NaN NaN NaN NaN NaN NaN NaN
3 personal_info_arm_1 NaN NaN John Doe (999) 999-9999 98105.0 NaN NaN 1.0 4.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
notifications_arm_1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN Boss NaN 0.0 NaN 0.0 NaN 2.0
case_intake_arm_1 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
notifications_arm_1 close_contacts 1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... (999) 999-9999 fake_email@gmail.com 2.0 NaN NaN NaN NaN NaN NaN NaN
notifications_arm_1 close_contacts 2.0 NaN NaN NaN NaN NaN NaN NaN NaN ... (999) 999-9999 fake_email@gmail.com 2.0 NaN NaN NaN NaN NaN NaN NaN

16 rows × 46 columns

20.1.3 Filter By Field Value

Export records that reported ‘female’ for gender.

%%capture
%%R
data_by_gender <- redcap_read_oneshot(
    redcap_uri = url, 
    token = token,
    filter_logic = "[gender] = '0'" 
)$data %>%
 select(c(record_id, gender))

Note: When filtering on REDCap’s multiple choice variables, yes/no variables, and checkboxes, you must put quotes around the coded value when using the filter_logic argument, otherwise REDCap will not perform the filtering correctly.

project.export_records(filter_logic="[gender] = 0", 
                       format_type='df'
                      ).reset_index()[['record_id', 'gender']]
record_id gender
0 2 0
1 4 0
2 6 0

Export records that reported an age greater than 20.

%%capture
%%R
data_by_age <- redcap_read_oneshot(
    redcap_uri = url, 
    token = token, 
    filter_logic = "[age] > 20 " 
)$data%>%
 select(c(record_id, age))

Note: Because age is a numeric field in REDCap, it does not need quotes around the number 20.

project.export_records(filter_logic="[age] > 20", 
                       format_type='df'
                      ).reset_index()[['record_id', 'age']]
record_id age
0 2 29
1 4 28
2 5 34
3 6 25

20.2 Export Selected Fields

Export first and last name

%%capture
%%R
#specifying record_id automatically also pulls the event, instrument and instance columns (when applicable).
field_subset_1 <- redcap_read_oneshot(
    fields = c("record_id","first_name","last_name"), 
    redcap_uri = url, 
    token = token
)$data

Note: if record_id is not specified, no identifier fields will be exported. By including record_id in the fields argument, all variables that make up the unique key are automatically exported.

project.export_records(records=['3','4'], 
                       fields=["first_name","last_name"],
                       format_type='df')
redcap_repeat_instrument redcap_repeat_instance first_name last_name
record_id redcap_event_name
3 personal_info_arm_1 NaN NaN John Doe
notifications_arm_1 NaN NaN NaN NaN
case_intake_arm_1 NaN 1.0 NaN NaN
4 personal_info_arm_1 NaN NaN Jane Doe
notifications_arm_1 NaN NaN NaN NaN
case_intake_arm_1 NaN 1.0 NaN NaN
case_intake_arm_1 NaN 2.0 NaN NaN

Note: if record_id is not specified, all fields that make up the unique key will still be exported.

20.3 Export Specific Instruments

See Chapter 13 to get a list of all the instrument names in your project. Specifying instruments to export will still export all rows of the project data (including rows not relevant to the desired instrument). However, it is useful because it will only export the fields (columns) in that instrument. Use the filter_logic argument and the <form>_complete variable (automatically created by REDCap for each form) to get the desired output.

In this example, we use the symptoms_complete field to export the ‘symptoms’ form and all associated data.

%%capture
%%R

records_dem <- redcap_read_oneshot(redcap_uri = url, 
                           fields = "record_id",
                           forms = "symptoms", 
                           filter_logic = '[symptoms_complete] <> ""',
                           token = token
                                       )$data

records_dem_tbl <- gt(head(records_dem))
gt::gtsave(records_dem_tbl, filename = 'export_records6.html', path = "./files/export_files/")

Note: You must add record_id to the fields argument for the data to export with the record_id and associated unique key attached.

project.export_records(forms='symptoms',
                       filter_logic ="[symptoms_complete] <>''",
                       format_type='df')
redcap_repeat_instrument redcap_repeat_instance symptoms_yesno symptom_onset symptoms_exp___1 symptoms_exp___2 symptoms_exp___3 symptoms_exp___4 symptoms_exp___5 symptoms_exp___6 symptoms_exp___7 symptoms_exp___8 symptoms_exp___9 symptoms_exp___10 symptoms_exp___11 symptom_notes symptoms_complete
record_id redcap_event_name
1 case_intake_arm_1 NaN 1 1 2023-10-09 1 1 0 0 0 0 1 0 0 0 0 NaN 2
2 case_intake_arm_1 NaN 1 1 2023-10-11 1 1 0 0 0 0 0 1 0 0 0 NaN 2
case_intake_arm_1 NaN 2 1 2021-06-05 1 1 1 1 1 0 1 1 1 0 0 NaN 2
3 case_intake_arm_1 NaN 1 1 2023-10-09 1 1 0 0 0 0 1 0 0 0 0 NaN 2
4 case_intake_arm_1 NaN 1 1 2023-10-11 1 1 0 0 0 0 0 1 0 0 0 NaN 2
case_intake_arm_1 NaN 2 1 2021-06-05 1 1 1 1 1 0 1 1 1 0 0 NaN 2
5 case_intake_arm_1 NaN 1 1 2023-10-08 1 0 0 0 0 0 0 0 0 0 0 NaN 2
6 case_intake_arm_1 NaN 1 1 2023-10-02 0 0 0 0 0 1 1 1 1 0 0 NaN 2

Note: if record_id is not specified, all fields that make up the unique key will still be exported.

20.4 Export Data as CSV (PyCap Only)

In Section 8.2 and Section 9.2, it was noted that to export labeled headers, the data needs to be exported as a csv. See the example below on how to do this.

from io import StringIO

data_csv = StringIO(project.export_records(records='2',
                       raw_or_label='label',
                       raw_or_label_headers='label', 
                       format_type='csv'))
df_csv = pd.read_csv(data_csv, sep=',')
df_csv
Record ID Event Name Repeat Instrument Repeat Instance First Name Last Name Phone Number ZIP Code Date of birth Age (years) ... Phone number of close contact Email of close contact Complete?.3 Supervisor Name Supervisor email Do you ever work in-person? Date you last worked in-person? Was this person at work while contagious? Symptom Onset Date: [case_intake_arm_1][symptom_onset] Was this person at work while contagious? Complete?.4
0 2 Personal Info NaN NaN Jane Doe (999) 999-9999 98105.0 1994-06-29 29.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2 Notifications NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN Boss fake_email@gmail.com Yes 2023-10-10 Yes NaN Complete
2 2 Case Intake NaN 1.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2 Case Intake NaN 2.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 2 Notifications Close Contacts 1.0 NaN NaN NaN NaN NaN NaN ... (999) 999-9999 fake_email@gmail.com Complete NaN NaN NaN NaN NaN NaN NaN
5 2 Notifications Close Contacts 2.0 NaN NaN NaN NaN NaN NaN ... (999) 999-9999 NaN Incomplete NaN NaN NaN NaN NaN NaN NaN

6 rows × 48 columns

20.5 Creating a Reference Class (REDCapR only)

This Reference Class represents a REDCap project. Once some values are set that are specific to a REDCap project (such as the URI and token), later calls are less verbose (such as reading and writing data).

First, define the project:

%%capture
%%R
project <- REDCapR::redcap_project$new(redcap_uri=url, token=token)
ds_all <- project$read()
%%capture
%%R
data <- project$read(fields = c("record_id", "gender", "first_name"))$data

Pull the record_id for all ‘female’ records:

%%R

record_of_females <- data$record_id[data$gender=='0']
record_of_females
 [1] NA NA  2 NA NA NA NA NA  4 NA NA NA NA NA  6 NA NA

20.6 Clean Checkbox Choices (REDCapR Only)

REDCapR has a checkbox_choices function that can be used to neatly list all answer options for a checkbox field.

First pull the metadata:

%%capture
%%R

metadata <- redcap_metadata_read(
    redcap_uri = url, 
    token = token
)$data

Select the checkbox field that you would like to view:

%%R
symptoms_exp <- metadata[metadata$field_name == "symptoms_exp",]$select_choices_or_calculations
symptoms_exp_list <- REDCapR::checkbox_choices(select_choices=symptoms_exp)
symptoms_exp_list
# A tibble: 11 x 2
   id    label              
   <chr> <chr>              
 1 1     Sore Throat        
 2 2     Cough              
 3 3     Shortness of Breath
 4 4     Chest Pain         
 5 5     Headache           
 6 6     Runny Nose         
 7 7     Congestion         
 8 8     Fever              
 9 9     Body Aches         
10 10    Nausea/Vomiting    
11 11    Diarrhea           

20.7 REDCap Constants (REDCapR Only)

You can quickly search for the numerical values of several ‘constants’ within REDCap. For example, when exporting data from REDCap, all instruments end with a variable called <form_name>_complete that when exported as raw data will take the values of 0, 1 or 2. You can see what each numerical value means by using constant("form_incomplete") and constant("form_complete").

Other constants across all REDCap projects include the values for the various user rights settings. For a full list of constants available, refer to the following documentation.

View the <form_name>_complete REDCap constant values:

%%R
REDCapR::constant(c(
  "form_incomplete",
  "form_complete",
  "form_unverified"
))
[1] 0 2 1