27 Appendix

27.1 Limitations to Importing

R
Python

Field Names
- Field names cannot be imported using REDCapR.
- Can be exported using redcap_variables().

Forms/Instruments
- Forms and instruments cannot be imported using REDCapR. Can be uploaded as a ZIP file in the REDCap web application under the ‘Designer’ page of the REDCap Project.
- Can be downloaded as pdf using redcap_instrument_download().

Instrument/Event Map
- Instrument/event mapping cannot be imported using REDCapR. Can be imported with REDCap’s native API or uploaded as a CSV in the REDCap web application under the ‘Designate Instruments for My Events’ on the ‘Project Setup’ page of the REDCap project.
- Can be exported using redcap_event_instruments().

Reports
- Cannot be imported using REDCapR.
- Can be exported using redcap_report().

Users
- Cannot be imported using REDCapR. Can be imported with REDCap’s native API or uploaded as a CSV in the REDCap web application under the ‘User Rights’ page of the REDCap Project.
- Can be exported using redcap_users_export().

User Roles
- Cannot be imported using REDCapR. Can be imported with REDCap’s native API or uploaded as a CSV in the REDCap web application under the ‘User Rights’ page of the REDCap Project.
- Can be exported using redcap_users_export().

Data Access Groups (DAGs)
- Cannot be imported using REDCapR. Can be imported with REDCap’s native API or uploaded as a CSV inthe REDCap web application under the ‘DAGs’ page of the REDCap Project.
- Can be exported using redcap_dag_read().

Logging
- Cannot be imported.
- Can be exported using redcap_log_read().

Field Names
- Field names alone cannot be imported using PyCap.

- Can be exported using export_field_names().

Forms/Instruments
- Forms and instruments cannot be imported using PyCap. Can be uploaded as a ZIP file in the REDCap web application under the ‘Designer’ page of the REDCap Project.
- Can be exported using export_instruments() and export_repeating_instruments_events() for the settings.

Reports
- Reports cannot be imported using PyCap.

- Can be exported using export_records().

Logging
- Logging cannot be imported using PyCap.

- Can be exported using export_logging().

27.2 General Import notes (regardless of package used)

27.2.1 REDCap Validations

Identifiers

No duplicates in the unique key.
- Note: Duplicates in the unique key will be automatically dropped during import (only one occurence of the duplicate will be uploaded) without any warning or error message. The output message only lists the number of unique ids that were imported.
If redcap_repeat_instrument is part of the unique key its value must be valid and associated with the correct event.
- Exception for longitudinal projects with repeating events and no repeating instruments: Data exported form these projects will have the ‘redcap_repeat_instrument’ column with all values set to NA. This column is technically not needed in this type of project and API import will be accepted with (as long as all values are NA) or without it.

Data Structure

That non-missing fields are in the correct instrument/event row (for longitudinal projects or projects with repeating instruments).
That ‘redcap_repeat_instance’ is filled when required and missing when required (for projects with repeating events or instruments).
That there are no extra columns in the data being imported that are not fields in the REDCap project.

Values

Radio and Dropdown Fields
- That values are within the set of select_choices_or_calculations as defined in the metadata.
Text Fields
- That only numeric values are present in numeric validation fields.
- That only integer values are present in integer validation fields.
- That date and datetime fields are in the YMD format (can be YYYY/MM/DD or YYYY-MM-DD).
- That email fields are formatted as ‘something’ + @ + ‘something’ + . + ‘something’ (does not check for valid domain names; multiple ‘.somethings’ are accepted after @ symbol (e.g., email@doh.wa.gov)).
- That phone numbers have 10 digits and that the area code starts with digits 2-9 (can have ###-###-#### format or just ##########).
- That zipcode fields have either 5 or 9 digits, and if 9 digits there is a hyphen after the 5th digit.
- That ‘alpha only’ text validation fields contain only letters (no spaces, numbers, or punctuation).
- That there are only 10 digits in the fields with MRN 10 digit validation fields.
- That there are only digits and ‘-’ or ‘_’ in fields with MRN generic validation fields.
Slider Fields
- That slider field values are within the text_validation_min and text_validation_max range as defined in the metadata.
Checkbox, True/False, Yes/No Fields
- Values must be 0 or 1 or missing.
Text Box (no validation) and Notes box
- Values are less than 65,000 characters.
Calculated Fields
- Regardless of the value in the data attempting to be imported, REDCap will auto-calculate these fields (assuming all of the inputs to the calculation are available). The values being imported are ignored.
Files and Signature Fields
- Regardless of the value in the data attempting to be imported, REDCap will ignore these fields when importing records. All file-type field imports are done through a separate process.
Dynamic Query Fields
- That values are within the dynamic query options (value options not exported in metadata).

NOT Validated

A second instance of a repeating event or instrument can be uploaded without a first instance exisiting (in import data or existing project data).
Data can be imported outside of the specified validation range for text fields with the following validation types: dates, times, datetimes, integers, and numbers. Recommendation: run Data Quality rule D after import.
Missing required fields can be imported. Recommendation: run Data Quality rule B after upload.
Incorrect calculated field values can be imported. Recommendation: run Data Quality rule H after upload.
Files and Signature fields can only be imported through the file import method, any data attempting to be imported using the record import method is ignored.
Field values that violate the REDCap project branching logic can be imported via the API without issues, however, when you go to open that record in REDCap it will alert you to the invalid logic.

REDCap Log Behavior

If data for an existing record is imported via the API but the incoming fields for that records are the same values as the fields already stored for that record in REDCap (no changes made), the API import will run, however, there will be no ‘Update record (API)’ action logged for this record.
If a row containing a valid combination of unique fields but NA accross all other fields is imported via API, the import will run. However, this ‘blank’ record will not be created and there will be no action logged in the REDCap log for this record..

27.3 Data Validation Tools

REDCapR has a few data validation functions that can be used to check your data before importing it to your REDCap project. These validations will not be specific to your paricular REDCap project but are general validations that apply to all REDCap projects.

For example, you can check if you have any boolean values (True/False) since REDCap will only accept a raw data import of 0/1 integers. You can also check for duplicates and unique IDs. You can view more details on these data validation functions here.

27.4 Example: Uploading Records from a CSV

In this example, we have a csv named “data_to_import.csv” with records to upload.

R
Python

%%R
df_to_import <- read.csv("./files/data_to_import.csv")

There are multiple rows per record because this project is longitudinal with repeat instruments and events.

%%R
# view which record_id's are currently being used in the data set to import. 
unique(df_to_import$record_id)

[1] 3 4 5 6

In the dataframe we will import, the record IDs are 3-6. However, these record IDs already exist in the REDCap project and importing this data would overwrite the exisiting record IDs 3-6. If we want to import these as new records, we will need to renumber the record IDs.

%%R
# start by getting the next available record_id
next_record <- redcap_next_free_record_name(redcap_uri=url, token=token)

R[write to console]: The next free record name in REDCap was successfully determined in 1.1 seconds.  The http status code was 200.  Is is 10.

%%R
### sequence the df_to_import records starting at one
df_to_import <- df_to_import[order(df_to_import$record_id), , drop = FALSE]
df_to_import$seq <- as.numeric(factor(df_to_import$record_id))

%%capture
%%R
sequencing <- df_to_import %>% group_by(record_id, seq) %>% summarize(n=n())

%%R 
head(sequencing)

# A tibble: 4 x 3
# Groups:   record_id [4]
  record_id   seq     n
      <int> <dbl> <int>
1         3     1     5
2         4     2     6
3         5     3     4
4         6     4     5

%%R
# Adjust record IDs to start at the next available record_id
df_to_import$record_id <- as.numeric(df_to_import$seq) + (as.numeric(next_record)-1)
unique(df_to_import$record_id)

[1] 10 11 12 13

The record IDs have been changed to new record IDs that don’t already exist in the REDCap project.

%%R
# Remove the seq var that was created above
df_to_import <- df_to_import %>% select(-seq)

Formatting Date Fields

Date fields in REDCap are character fields with a designated date validation added. There are many different types of date validations/formats that can be chosen for a date field. All date fields must be imported to REDCap formatted as YYYY-MM-DD, regardless of the specific date format designated for this field in the REDCap project. (PyCap has an import records argument to change the default YMD format, but REDCapR does not have this option.) Below is an example on how to use the project metadata to isolate and format all date fields before importing data.

%%R
# Export metadata
metadata <- redcap_metadata_read(redcap_uri = url, token = token)$data

R[write to console]: The data dictionary describing 30 fields was read from REDCap in 0.2 seconds.  The http status code was 200.

Note that the ‘text_validation_type_or_show_slider_number’ field in the metadata is where the date format is specified.

%%R
unique(metadata$text_validation_type_or_show_slider_number)

[1] NA         "phone"    "integer"  "date_mdy" "email"

%%R
# Isolate all field_names in the metadata that have any date validation 
date_fields <- metadata %>% filter(grepl("date", text_validation_type_or_show_slider_number)) %>% select(field_name)

%%R
# Make a list of all the date fields
date_list <- (date_fields$field_name)
date_list

[1] "dob"                "symptom_onset"      "test_positive_date"
[4] "prior_covid_date"   "cc_date"            "work_date"

%%R
# mutate across all date fields to get the desired Y-M-D format.  
df_to_import2 <- df_to_import %>%
  mutate(across(all_of(date_list), ~as.Date(., "%m/%d/%Y" )))

%%R
unique(df_to_import2$test_positive_date)

[1] NA           "2023-10-10" "2023-10-12" "2021-06-07" "2023-10-03"

Now import the new records.

%%capture --no-stdout
%%R
redcap_write(df_to_import2, redcap_uri=url, token=token)

$success
[1] TRUE

$status_code
[1] "200"

$outcome_message
[1] "4 records were written to REDCap in 1.8 seconds."

$records_affected_count
[1] 4

$affected_ids
[1] "10" "11" "12" "13"

$elapsed_seconds
[1] 2.354373

When reading a csv as a pandas dataframe, Python will take any numeric column with missing data and convert them to float with NaN inserted in the blank cells. In longitudinal projects, we expect many blank cells since the data is wide, as only columns relevant to that event/instrument field are filled out in each row. Many of REDCap’s field types (checkbox, yes/no, radio, and form_complete variables) are integers. Pandas will convert these columns to float variables with a decimal place added (e.g. 1.0 instead of 1 for ‘Yes’ in a yes/no field) and importing this to REDCap will fail.

# Read and view data to import
df_to_import = pd.read_csv("./files/data_to_import.csv")
df_to_import.head()

	record_id	redcap_event_name	redcap_repeat_instrument	redcap_repeat_instance	redcap_survey_identifier	demographics_timestamp	first_name	last_name	phone_num	zip_code	...	cc_phone	cc_email	close_contacts_complete	supervisor_name	supervisor_email	work_inperson_yesno	work_date	work_contagious	work_contagious_calc	work_information_complete
0	3	personal_info_arm_1	NaN	NaN	NaN	NaN	John	Doe	(999) 999-9999	98105.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	3	notifications_arm_1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	Boss	NaN	0.0	NaN	0.0	NaN	2.0
2	3	case_intake_arm_1	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	3	notifications_arm_1	close_contacts	1.0	NaN	NaN	NaN	NaN	NaN	NaN	...	(999) 999-9999	fake_email@gmail.com	2.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	3	notifications_arm_1	close_contacts	2.0	NaN	NaN	NaN	NaN	NaN	NaN	...	(999) 999-9999	fake_email@gmail.com	2.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 50 columns

Notice how the redcap_repeat_instance, close_contacts_complete, and work_inperson_yesno are some of the many fields that were converted to float with an added decimal. Importing this dataset as-is will produce errors.

Solution: Convert all floats to Int64 Pandas datatype.

Int64 is a unique pandas datatype that allows numeric fields to contain missing values. For more information, read the documentation here.
Note: Before applying this solution, ensure that there are no numeric fields in your REDCap project that should have decimals (you will not want to convert these variables to int64 since they would lose their decimal places). Make sure you are familiar with your project’s metadata. All radio, checkboxes, yes/no, redcap_repeat_instance, and form_complete variables need to be integers. In REDCap, actual numeric fields are stored as text fields with optional validation. Any text field in REDCap with no validation or with ‘numeric’ as their validation type will accept numbers with decimal places. Any text fields with other validations types (i.e. zip code, phone number, integer) will not accept decimals.

float_list = df_to_import.select_dtypes(include=[np.float64]).columns.values.tolist()
print(float_list)

['redcap_repeat_instance', 'redcap_survey_identifier', 'demographics_timestamp', 'zip_code', 'age', 'ethnicity', 'race', 'gender', 'demographics_complete', 'symptoms_yesno', 'symptoms_exp___1', 'symptoms_exp___2', 'symptoms_exp___3', 'symptoms_exp___4', 'symptoms_exp___5', 'symptoms_exp___6', 'symptoms_exp___7', 'symptoms_exp___8', 'symptoms_exp___9', 'symptoms_exp___10', 'symptoms_exp___11', 'symptom_notes', 'symptoms_complete', 'test_yesno', 'test_positive_yesno', 'prior_covid_yesno', 'test_information_complete', 'close_contacts_complete', 'work_inperson_yesno', 'work_contagious', 'work_contagious_calc', 'work_information_complete']

At this point, if needed, you can remove any variables from this list that you need to keep as a float.

df_to_import[float_list] = df_to_import[float_list].apply(lambda x: x.astype("Int64"))
df_to_import.head()

	record_id	redcap_event_name	redcap_repeat_instrument	redcap_repeat_instance	redcap_survey_identifier	demographics_timestamp	first_name	last_name	phone_num	zip_code	...	cc_phone	cc_email	close_contacts_complete	supervisor_name	supervisor_email	work_inperson_yesno	work_date	work_contagious	work_contagious_calc	work_information_complete
0	3	personal_info_arm_1	NaN	<NA>	<NA>	<NA>	John	Doe	(999) 999-9999	98105	...	NaN	NaN	<NA>	NaN	NaN	<NA>	NaN	<NA>	<NA>	<NA>
1	3	notifications_arm_1	NaN	<NA>	<NA>	<NA>	NaN	NaN	NaN	<NA>	...	NaN	NaN	<NA>	Boss	NaN	0	NaN	0	<NA>	2
2	3	case_intake_arm_1	NaN	1	<NA>	<NA>	NaN	NaN	NaN	<NA>	...	NaN	NaN	<NA>	NaN	NaN	<NA>	NaN	<NA>	<NA>	<NA>
3	3	notifications_arm_1	close_contacts	1	<NA>	<NA>	NaN	NaN	NaN	<NA>	...	(999) 999-9999	fake_email@gmail.com	2	NaN	NaN	<NA>	NaN	<NA>	<NA>	<NA>
4	3	notifications_arm_1	close_contacts	2	<NA>	<NA>	NaN	NaN	NaN	<NA>	...	(999) 999-9999	fake_email@gmail.com	2	NaN	NaN	<NA>	NaN	<NA>	<NA>	<NA>

5 rows × 50 columns

You can now see the redcap_repeat_instance and close_contacts_complete fields are integers. The <NA> seen in the blank cells will not interfere with data import. Now you can make any other edits necessary and import the data successfully.

# Import data
project.import_records(df_to_import, date_format = 'MDY', import_format = 'df')

{'count': 4}