%%R
<- read.csv("./files/data_to_import.csv") df_to_import
27 Appendix
27.1 Limitations to Importing
Field Names
- Field names cannot be imported using REDCapR.
- Can be exported using redcap_variables()
.
Forms/Instruments
- Forms and instruments cannot be imported using REDCapR. Can be uploaded as a ZIP file in the REDCap web application under the ‘Designer’ page of the REDCap Project.
- Can be downloaded as pdf using redcap_instrument_download()
.
Instrument/Event Map
- Instrument/event mapping cannot be imported using REDCapR. Can be imported with REDCap’s native API or uploaded as a CSV in the REDCap web application under the ‘Designate Instruments for My Events’ on the ‘Project Setup’ page of the REDCap project.
- Can be exported using redcap_event_instruments()
.
Reports
- Cannot be imported using REDCapR.
- Can be exported using redcap_report()
.
Users
- Cannot be imported using REDCapR. Can be imported with REDCap’s native API or uploaded as a CSV in the REDCap web application under the ‘User Rights’ page of the REDCap Project.
- Can be exported using redcap_users_export()
.
User Roles
- Cannot be imported using REDCapR. Can be imported with REDCap’s native API or uploaded as a CSV in the REDCap web application under the ‘User Rights’ page of the REDCap Project.
- Can be exported using redcap_users_export()
.
Data Access Groups (DAGs)
- Cannot be imported using REDCapR. Can be imported with REDCap’s native API or uploaded as a CSV inthe REDCap web application under the ‘DAGs’ page of the REDCap Project.
- Can be exported using redcap_dag_read()
.
Logging
- Cannot be imported.
- Can be exported using redcap_log_read()
.
Field Names
- Field names alone cannot be imported using PyCap.
- Can be exported using export_field_names()
.
Forms/Instruments
- Forms and instruments cannot be imported using PyCap. Can be uploaded as a ZIP file in the REDCap web application under the ‘Designer’ page of the REDCap Project.
- Can be exported using export_instruments()
and export_repeating_instruments_events()
for the settings.
Reports
- Reports cannot be imported using PyCap.
- Can be exported using export_records()
.
Logging
- Logging cannot be imported using PyCap.
- Can be exported using export_logging()
.
27.2 General Import notes (regardless of package used)
27.2.1 REDCap Validations
Identifiers
- No duplicates in the unique key.
- Note: Duplicates in the unique key will be automatically dropped during import (only one occurence of the duplicate will be uploaded) without any warning or error message. The output message only lists the number of unique ids that were imported.
- If
redcap_repeat_instrument
is part of the unique key its value must be valid and associated with the correct event.- Exception for longitudinal projects with repeating events and no repeating instruments: Data exported form these projects will have the ‘redcap_repeat_instrument’ column with all values set to NA. This column is technically not needed in this type of project and API import will be accepted with (as long as all values are NA) or without it.
Data Structure
That non-missing fields are in the correct instrument/event row (for longitudinal projects or projects with repeating instruments).
That ‘redcap_repeat_instance’ is filled when required and missing when required (for projects with repeating events or instruments).
That there are no extra columns in the data being imported that are not fields in the REDCap project.
Values
- Radio and Dropdown Fields
- That values are within the set of
select_choices_or_calculations
as defined in the metadata.
- That values are within the set of
- Text Fields
- That only numeric values are present in numeric validation fields.
- That only integer values are present in integer validation fields.
- That date and datetime fields are in the YMD format (can be YYYY/MM/DD or YYYY-MM-DD).
- That email fields are formatted as ‘something’ + @ + ‘something’ + . + ‘something’ (does not check for valid domain names; multiple ‘.somethings’ are accepted after @ symbol (e.g., email@doh.wa.gov)).
- That phone numbers have 10 digits and that the area code starts with digits 2-9 (can have ###-###-#### format or just ##########).
- That zipcode fields have either 5 or 9 digits, and if 9 digits there is a hyphen after the 5th digit.
- That ‘alpha only’ text validation fields contain only letters (no spaces, numbers, or punctuation).
- That there are only 10 digits in the fields with MRN 10 digit validation fields.
- That there are only digits and ‘-’ or ‘_’ in fields with MRN generic validation fields.
- That only numeric values are present in numeric validation fields.
- Slider Fields
- That slider field values are within the
text_validation_min
andtext_validation_max
range as defined in the metadata.
- That slider field values are within the
- Checkbox, True/False, Yes/No Fields
- Values must be 0 or 1 or missing.
- Text Box (no validation) and Notes box
- Values are less than 65,000 characters.
- Calculated Fields
- Regardless of the value in the data attempting to be imported, REDCap will auto-calculate these fields (assuming all of the inputs to the calculation are available). The values being imported are ignored.
- Files and Signature Fields
- Regardless of the value in the data attempting to be imported, REDCap will ignore these fields when importing records. All file-type field imports are done through a separate process.
- Dynamic Query Fields
- That values are within the dynamic query options (value options not exported in metadata).
NOT Validated
A second instance of a repeating event or instrument can be uploaded without a first instance exisiting (in import data or existing project data).
Data can be imported outside of the specified validation range for text fields with the following validation types: dates, times, datetimes, integers, and numbers. Recommendation: run Data Quality rule D after import.
Missing required fields can be imported. Recommendation: run Data Quality rule B after upload.
Incorrect calculated field values can be imported. Recommendation: run Data Quality rule H after upload.
Files and Signature fields can only be imported through the file import method, any data attempting to be imported using the record import method is ignored.
Field values that violate the REDCap project branching logic can be imported via the API without issues, however, when you go to open that record in REDCap it will alert you to the invalid logic.
REDCap Log Behavior
If data for an existing record is imported via the API but the incoming fields for that records are the same values as the fields already stored for that record in REDCap (no changes made), the API import will run, however, there will be no ‘Update record (API)’ action logged for this record.
If a row containing a valid combination of unique fields but NA accross all other fields is imported via API, the import will run. However, this ‘blank’ record will not be created and there will be no action logged in the REDCap log for this record..
27.3 Data Validation Tools
REDCapR has a few data validation functions that can be used to check your data before importing it to your REDCap project. These validations will not be specific to your paricular REDCap project but are general validations that apply to all REDCap projects.
For example, you can check if you have any boolean values (True/False) since REDCap will only accept a raw data import of 0/1 integers. You can also check for duplicates and unique IDs. You can view more details on these data validation functions here.
27.4 Example: Uploading Records from a CSV
In this example, we have a csv named “data_to_import.csv” with records to upload.
There are multiple rows per record because this project is longitudinal with repeat instruments and events.
%%R
# view which record_id's are currently being used in the data set to import.
unique(df_to_import$record_id)
[1] 3 4 5 6
In the dataframe we will import, the record IDs are 3-6. However, these record IDs already exist in the REDCap project and importing this data would overwrite the exisiting record IDs 3-6. If we want to import these as new records, we will need to renumber the record IDs.
%%R
# start by getting the next available record_id
<- redcap_next_free_record_name(redcap_uri=url, token=token) next_record
R[write to console]: The next free record name in REDCap was successfully determined in 1.1 seconds. The http status code was 200. Is is 10.
%%R
### sequence the df_to_import records starting at one
<- df_to_import[order(df_to_import$record_id), , drop = FALSE]
df_to_import <- as.numeric(factor(df_to_import$record_id)) df_to_import$seq
%%capture
%%R
<- df_to_import %>% group_by(record_id, seq) %>% summarize(n=n()) sequencing
%%R
head(sequencing)
# A tibble: 4 x 3
# Groups: record_id [4]
record_id seq n
<int> <dbl> <int>
1 3 1 5
2 4 2 6
3 5 3 4
4 6 4 5
%%R
# Adjust record IDs to start at the next available record_id
<- as.numeric(df_to_import$seq) + (as.numeric(next_record)-1)
df_to_import$record_id unique(df_to_import$record_id)
[1] 10 11 12 13
The record IDs have been changed to new record IDs that don’t already exist in the REDCap project.
%%R
# Remove the seq var that was created above
<- df_to_import %>% select(-seq) df_to_import
Formatting Date Fields
Date fields in REDCap are character fields with a designated date validation added. There are many different types of date validations/formats that can be chosen for a date field. All date fields must be imported to REDCap formatted as YYYY-MM-DD
, regardless of the specific date format designated for this field in the REDCap project. (PyCap has an import records argument to change the default YMD format, but REDCapR does not have this option.) Below is an example on how to use the project metadata to isolate and format all date fields before importing data.
%%R
# Export metadata
<- redcap_metadata_read(redcap_uri = url, token = token)$data metadata
R[write to console]: The data dictionary describing 30 fields was read from REDCap in 0.2 seconds. The http status code was 200.
Note that the ‘text_validation_type_or_show_slider_number’ field in the metadata is where the date format is specified.
%%R
unique(metadata$text_validation_type_or_show_slider_number)
[1] NA "phone" "integer" "date_mdy" "email"
%%R
# Isolate all field_names in the metadata that have any date validation
<- metadata %>% filter(grepl("date", text_validation_type_or_show_slider_number)) %>% select(field_name) date_fields
%%R
# Make a list of all the date fields
<- (date_fields$field_name)
date_list date_list
[1] "dob" "symptom_onset" "test_positive_date"
[4] "prior_covid_date" "cc_date" "work_date"
%%R
# mutate across all date fields to get the desired Y-M-D format.
<- df_to_import %>%
df_to_import2 ~as.Date(., "%m/%d/%Y" ))) mutate(across(all_of(date_list),
%%R
unique(df_to_import2$test_positive_date)
[1] NA "2023-10-10" "2023-10-12" "2021-06-07" "2023-10-03"
Now import the new records.
%%capture --no-stdout
%%R
=url, token=token) redcap_write(df_to_import2, redcap_uri
$success
[1] TRUE
$status_code
[1] "200"
$outcome_message
[1] "4 records were written to REDCap in 1.8 seconds."
$records_affected_count
[1] 4
$affected_ids
[1] "10" "11" "12" "13"
$elapsed_seconds
[1] 2.354373
When reading a csv as a pandas dataframe, Python will take any numeric column with missing data and convert them to float with NaN inserted in the blank cells. In longitudinal projects, we expect many blank cells since the data is wide, as only columns relevant to that event/instrument field are filled out in each row. Many of REDCap’s field types (checkbox, yes/no, radio, and form_complete variables) are integers. Pandas will convert these columns to float variables with a decimal place added (e.g. 1.0 instead of 1 for ‘Yes’ in a yes/no field) and importing this to REDCap will fail.
# Read and view data to import
= pd.read_csv("./files/data_to_import.csv")
df_to_import df_to_import.head()
record_id | redcap_event_name | redcap_repeat_instrument | redcap_repeat_instance | redcap_survey_identifier | demographics_timestamp | first_name | last_name | phone_num | zip_code | ... | cc_phone | cc_email | close_contacts_complete | supervisor_name | supervisor_email | work_inperson_yesno | work_date | work_contagious | work_contagious_calc | work_information_complete | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | personal_info_arm_1 | NaN | NaN | NaN | NaN | John | Doe | (999) 999-9999 | 98105.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 3 | notifications_arm_1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | Boss | NaN | 0.0 | NaN | 0.0 | NaN | 2.0 |
2 | 3 | case_intake_arm_1 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 3 | notifications_arm_1 | close_contacts | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | (999) 999-9999 | fake_email@gmail.com | 2.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 3 | notifications_arm_1 | close_contacts | 2.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | (999) 999-9999 | fake_email@gmail.com | 2.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 50 columns
Notice how the redcap_repeat_instance, close_contacts_complete, and work_inperson_yesno are some of the many fields that were converted to float with an added decimal. Importing this dataset as-is will produce errors.
Solution: Convert all floats to Int64 Pandas datatype.
Int64 is a unique pandas datatype that allows numeric fields to contain missing values. For more information, read the documentation here.
Note: Before applying this solution, ensure that there are no numeric fields in your REDCap project that should have decimals (you will not want to convert these variables to int64 since they would lose their decimal places). Make sure you are familiar with your project’s metadata. All radio, checkboxes, yes/no, redcap_repeat_instance, and form_complete variables need to be integers. In REDCap, actual numeric fields are stored as text fields with optional validation. Any text field in REDCap with no validation or with ‘numeric’ as their validation type will accept numbers with decimal places. Any text fields with other validations types (i.e. zip code, phone number, integer) will not accept decimals.
= df_to_import.select_dtypes(include=[np.float64]).columns.values.tolist()
float_list print(float_list)
['redcap_repeat_instance', 'redcap_survey_identifier', 'demographics_timestamp', 'zip_code', 'age', 'ethnicity', 'race', 'gender', 'demographics_complete', 'symptoms_yesno', 'symptoms_exp___1', 'symptoms_exp___2', 'symptoms_exp___3', 'symptoms_exp___4', 'symptoms_exp___5', 'symptoms_exp___6', 'symptoms_exp___7', 'symptoms_exp___8', 'symptoms_exp___9', 'symptoms_exp___10', 'symptoms_exp___11', 'symptom_notes', 'symptoms_complete', 'test_yesno', 'test_positive_yesno', 'prior_covid_yesno', 'test_information_complete', 'close_contacts_complete', 'work_inperson_yesno', 'work_contagious', 'work_contagious_calc', 'work_information_complete']
At this point, if needed, you can remove any variables from this list that you need to keep as a float.
= df_to_import[float_list].apply(lambda x: x.astype("Int64"))
df_to_import[float_list] df_to_import.head()
record_id | redcap_event_name | redcap_repeat_instrument | redcap_repeat_instance | redcap_survey_identifier | demographics_timestamp | first_name | last_name | phone_num | zip_code | ... | cc_phone | cc_email | close_contacts_complete | supervisor_name | supervisor_email | work_inperson_yesno | work_date | work_contagious | work_contagious_calc | work_information_complete | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | personal_info_arm_1 | NaN | <NA> | <NA> | <NA> | John | Doe | (999) 999-9999 | 98105 | ... | NaN | NaN | <NA> | NaN | NaN | <NA> | NaN | <NA> | <NA> | <NA> |
1 | 3 | notifications_arm_1 | NaN | <NA> | <NA> | <NA> | NaN | NaN | NaN | <NA> | ... | NaN | NaN | <NA> | Boss | NaN | 0 | NaN | 0 | <NA> | 2 |
2 | 3 | case_intake_arm_1 | NaN | 1 | <NA> | <NA> | NaN | NaN | NaN | <NA> | ... | NaN | NaN | <NA> | NaN | NaN | <NA> | NaN | <NA> | <NA> | <NA> |
3 | 3 | notifications_arm_1 | close_contacts | 1 | <NA> | <NA> | NaN | NaN | NaN | <NA> | ... | (999) 999-9999 | fake_email@gmail.com | 2 | NaN | NaN | <NA> | NaN | <NA> | <NA> | <NA> |
4 | 3 | notifications_arm_1 | close_contacts | 2 | <NA> | <NA> | NaN | NaN | NaN | <NA> | ... | (999) 999-9999 | fake_email@gmail.com | 2 | NaN | NaN | <NA> | NaN | <NA> | <NA> | <NA> |
5 rows × 50 columns
You can now see the redcap_repeat_instance
and close_contacts_complete
fields are integers. The <NA>
seen in the blank cells will not interfere with data import. Now you can make any other edits necessary and import the data successfully.
# Import data
= 'MDY', import_format = 'df') project.import_records(df_to_import, date_format
{'count': 4}