Submission-Grade Statistical Programming Example (SAS + R)
ADaM ADSL Build + Independent QC + Reproducible Run Artifacts
Jonathan D. Stallings, PhD, MS
December 24, 2025
- 1 Purpose and scope
- 2 Project layout (recommended)
- 3 Example dataset logic (ADSL)
- 4 Section A — Run Driver (controlled execution)
- 5 Section B — Build ADSL (derivation program)
- 6 Section C — Independent QC (dual programming pattern)
- 7 Section D — “Submission-grade” completeness checklist (what you now have)
1 Purpose and scope
This document provides a submission-grade programming example demonstrating how regulated intent becomes executable code. It includes:
- ADaM ADSL build from SDTM-like inputs (DM, EX)
- Independent QC build and comparison
- Run artifacts supporting audit readiness: manifest, checks, and deterministic outputs
- SAS and R implementations with aligned logic and best-practice commentary
Important scope note
This example focuses on: (1) dataset derivation rigor, (2) audit-friendly execution, and (3) QC independence. In real submission packages, additional study-specific artifacts exist (SAP, dataset specs, Define-XML, SDRG, controlled terminology libraries). Those items are intentionally not fully generated here to keep the example readable and website-friendly.
2 Project layout
A clean repo layout supports traceability and controlled execution.
project/
programs/
sas/
00_run_driver.sas
01_build_adsl.sas
02_qc_adsl.sas
r/
00_run_driver.R
01_build_adsl.R
02_qc_adsl.R
data/
sdtm/ (read-only in regulated runs)
adam/ (write outputs here)
outputs/
logs/
manifests/
qc/
docs/
(optional) specs, shells, notes
Industry best practice alignment
- Separation of concerns: one program per dataset (build), plus an independent QC program.
- Controlled I/O: explicit, parameterized paths; read-only inputs; deterministic outputs.
- Audit readiness: run manifests, logs, checksums, and QC reports produced every run.
3 Example dataset logic (ADSL)
This example derives a minimal but realistic ADSL:
- Keys:
STUDYID,USUBJID - Dates:
RANDDT,TRTSDT,TRTEDT - Treatment:
TRT01P,TRT01A - Population:
SAFFL(any exposure)
Assumptions (declared in code):
- SDTM-like inputs have ISO 8601 date strings
(
YYYY-MM-DD...) - One primary exposure treatment per subject (simplified)
4 Section A — Run Driver (controlled execution)
4.1 Why this matters (best practice)
A “submission-grade” pipeline has a driver that:
- centralizes parameters (study ID, paths, run ID)
- enforces input existence and write locations
- produces a run manifest (what ran, when, with what inputs)
- stops on failure (no silent partial runs)
4.2 SAS vs R (scroll to see full code)
/*=============================================================================
Program: 00_run_driver.sas
Purpose: Controlled execution driver (submission-grade pattern).
Author: Jonathan D. Stallings, PhD, MS
Notes: Centralizes parameters, sets paths, runs build + QC, writes manifest.
=============================================================================*/
options nodate nonumber mprint mlogic symbolgen validvarname=upcase missing=' ';
%let STUDYID = STUDY-XYZ;
%let ROOT = /path/to/project;
%let SDTM_DIR = &ROOT/data/sdtm;
%let ADAM_DIR = &ROOT/data/adam;
%let OUT_DIR = &ROOT/outputs;
%let LOG_DIR = &OUT_DIR/logs;
%let MAN_DIR = &OUT_DIR/manifests;
%let QC_DIR = &OUT_DIR/qc;
%let RUN_DTTM = %sysfunc(datetime(), e8601dt.);
%let RUN_ID = %sysfunc(compress(&RUN_DTTM, :-T));
/* Ensure output folders exist (OS-dependent; keep simple for example) */
options dlcreatedir;
libname _tmp "&OUT_DIR/_tmp";
libname _tmp clear;
options nodlcreatedir;
libname sdtm "&SDTM_DIR";
libname adam "&ADAM_DIR";
libname out "&OUT_DIR";
libname qc "&QC_DIR";
%macro assert_exist(ds);
%if not %sysfunc(exist(&ds)) %then %do;
%put ERROR: Missing required dataset: &ds;
%abort cancel;
%end;
%mend;
%assert_exist(sdtm.dm);
%assert_exist(sdtm.ex);
/* Run build then QC */
%include "&ROOT/programs/sas/01_build_adsl.sas";
%include "&ROOT/programs/sas/02_qc_adsl.sas";
/* Minimal run manifest */
data out.run_manifest_sas;
length RUN_ID $40 RUN_DTTM $30 STUDYID $40 SDTM_DIR ADAM_DIR OUT_DIR $200;
RUN_ID="&RUN_ID";
RUN_DTTM="&RUN_DTTM";
STUDYID="&STUDYID";
SDTM_DIR="&SDTM_DIR";
ADAM_DIR="&ADAM_DIR";
OUT_DIR="&OUT_DIR";
run;
proc export data=out.run_manifest_sas
outfile="&MAN_DIR/run_manifest_sas_&RUN_ID..csv"
dbms=csv replace;
run;
# =============================================================================
# Program: 00_run_driver.R
# Purpose: Controlled execution driver (submission-grade pattern).
# Author: Jonathan D. Stallings, PhD, MS
# Notes: Centralizes parameters, runs build + QC, writes manifest + session.
# =============================================================================
study_id <- "STUDY-XYZ"
root <- "path/to/project"
paths <- list(
sdtm_dm = file.path(root, "data/sdtm/dm.csv"),
sdtm_ex = file.path(root, "data/sdtm/ex.csv"),
adam_adsl = file.path(root, "data/adam/adsl.csv"),
out_dir = file.path(root, "outputs"),
log_dir = file.path(root, "outputs/logs"),
man_dir = file.path(root, "outputs/manifests"),
qc_dir = file.path(root, "outputs/qc")
)
dir.create(paths$out_dir, recursive = TRUE, showWarnings = FALSE)
dir.create(paths$log_dir, recursive = TRUE, showWarnings = FALSE)
dir.create(paths$man_dir, recursive = TRUE, showWarnings = FALSE)
dir.create(paths$qc_dir, recursive = TRUE, showWarnings = FALSE)
stop_if_missing <- function(p) {
if (!file.exists(p)) stop(sprintf("Missing required input: %s", p), call. = FALSE)
}
stop_if_missing(paths$sdtm_dm)
stop_if_missing(paths$sdtm_ex)
run_dttm_utc <- format(Sys.time(), tz = "UTC", usetz = TRUE)
run_id <- gsub("[^0-9]", "", run_dttm_utc)
# Source build + QC scripts
source(file.path(root, "programs/r/01_build_adsl.R"))
source(file.path(root, "programs/r/02_qc_adsl.R"))
# Write run manifest + session info
manifest <- data.frame(
run_id = run_id,
run_dttm_utc = run_dttm_utc,
study_id = study_id,
sdtm_dm = paths$sdtm_dm,
sdtm_ex = paths$sdtm_ex,
adam_adsl = paths$adam_adsl,
stringsAsFactors = FALSE
)
write.csv(manifest, file.path(paths$man_dir, paste0("run_manifest_r_", run_id, ".csv")), row.names = FALSE)
capture.output(sessionInfo(), file = file.path(paths$man_dir, paste0("sessionInfo_", run_id, ".txt")))What this demonstrates (FDA reviewer lens)
- Controlled execution: a single entry point with explicit parameters reduces ambiguity.
- Deterministic runs: consistent inputs/outputs with run identifiers support traceability.
- Audit trail artifacts: manifests and logs make it easy to reconstruct what happened.
5 Section B — Build ADSL (derivation program)
5.1 Best-practice principles applied
- Single responsibility: one build program for ADSL
- Declared assumptions: date parsing, exposure summarization rules
- Explicit variable derivations with labels (SAS) and consistent naming (R)
- Integrity checks that stop the run on critical failures (duplicates, missing keys)
5.2 SAS vs R (scroll to see full code)
/*=============================================================================
Program: 01_build_adsl.sas
Purpose: Build ADaM ADSL from SDTM DM and EX (submission-grade pattern).
Inputs: sdtm.dm, sdtm.ex
Output: adam.adsl
Assumptions: ISO 8601 dates; 1 primary treatment per subject (simplified).
=============================================================================*/
%macro stop_if_dups(ds, key);
proc sort data=&ds out=_chk nodupkey dupout=_dups; by &key; run;
%local ndup;
proc sql noprint; select count(*) into :ndup trimmed from _dups; quit;
%if &ndup > 0 %then %do;
%put ERROR: Duplicate key(s) detected in &ds by &key..;
proc export data=_dups outfile="&QC_DIR/dups_%scan(&ds,2,.)..csv" dbms=csv replace; run;
%abort cancel;
%end;
%mend;
%macro assert_vars(ds, varlist);
%local i v;
%do i=1 %to %sysfunc(countw(&varlist));
%let v = %scan(&varlist,&i);
proc sql noprint;
select count(*) into :_vexists trimmed
from dictionary.columns
where libname=%upcase("%scan(&ds,1,.)")
and memname=%upcase("%scan(&ds,2,.)")
and name=%upcase("&v");
quit;
%if &_vexists = 0 %then %do;
%put ERROR: Variable &v not found in &ds;
%abort cancel;
%end;
%end;
%mend;
/* Validate required variables exist */
%assert_vars(sdtm.dm, STUDYID USUBJID RFSTDTC AGE AGEU SEX RACE ARM);
%assert_vars(sdtm.ex, STUDYID USUBJID EXTRT EXSTDTC EXENDTC);
/* Step 1: DM subset */
data work.dm0;
set sdtm.dm;
where STUDYID="&STUDYID";
keep STUDYID USUBJID SITEID SUBJID RFSTDTC RFENDTC BRTHDTC AGE AGEU SEX RACE ARM;
run;
%stop_if_dups(work.dm0, USUBJID);
/* Step 2: EX exposure summary */
data work.ex0;
set sdtm.ex;
where STUDYID="&STUDYID";
keep STUDYID USUBJID EXTRT EXSTDTC EXENDTC;
run;
proc sort data=work.ex0; by USUBJID EXSTDTC; run;
data work.ex_summ;
set work.ex0;
by USUBJID;
length TRT01P $20;
retain TRT01P TRTSDT TRTEDT;
format TRTSDT TRTEDT yymmdd10.;
if first.USUBJID then do;
TRT01P = strip(EXTRT);
TRTSDT = input(substr(EXSTDTC,1,10), yymmdd10.);
TRTEDT = .;
end;
/* last non-missing end date */
if not missing(EXENDTC) then TRTEDT = input(substr(EXENDTC,1,10), yymmdd10.);
if last.USUBJID then output;
keep USUBJID TRT01P TRTSDT TRTEDT;
run;
%stop_if_dups(work.ex_summ, USUBJID);
/* Step 3: Build ADSL */
proc sort data=work.dm0; by USUBJID; run;
proc sort data=work.ex_summ; by USUBJID; run;
data adam.adsl(label="Subject-Level Analysis Dataset (ADSL)");
merge work.dm0(in=a) work.ex_summ(in=b);
by USUBJID;
length TRT01A $20 SAFFL $1;
format RANDDT TRTSDT TRTEDT yymmdd10.;
if not a then delete;
RANDDT = input(substr(RFSTDTC,1,10), yymmdd10.);
SAFFL = ifc(b and not missing(TRTSDT), "Y", "N");
TRT01A = TRT01P;
label
RANDDT = "Date of Randomization/Reference Start Date"
TRT01P = "Planned Treatment for Period 01"
TRT01A = "Actual Treatment for Period 01"
TRTSDT = "Treatment Start Date"
TRTEDT = "Treatment End Date"
SAFFL = "Safety Population Flag"
;
/* Hard-stop integrity checks */
if missing(USUBJID) then do;
put "ERROR: Missing USUBJID in ADSL";
abort cancel;
end;
if SAFFL="Y" and missing(TRT01A) then do;
put "ERROR: SAFFL=Y but TRT01A missing for " USUBJID=;
abort cancel;
end;
run;
%stop_if_dups(adam.adsl, USUBJID);
/* Produce a small build summary artifact */
proc sql;
create table qc.adsl_build_summary as
select
count(*) as n_records,
sum(SAFFL="Y") as n_saffl,
sum(missing(USUBJID)) as n_missing_usubjid
from adam.adsl;
quit;
proc export data=qc.adsl_build_summary
outfile="&QC_DIR/adsl_build_summary.csv"
dbms=csv replace;
run;
# =============================================================================
# Program: 01_build_adsl.R
# Purpose: Build ADaM ADSL from SDTM-like DM/EX (submission-grade pattern).
# Inputs: dm.csv, ex.csv
# Output: adsl.csv
# Assump: ISO 8601 dates; 1 primary treatment per subject (simplified).
# =============================================================================
# Explicit namespaces (portable across environments)
dm_path <- paths$sdtm_dm
ex_path <- paths$sdtm_ex
adsl_path <- paths$adam_adsl
iso_to_date <- function(x) {
if (all(is.na(x))) return(as.Date(rep(NA, length(x))))
as.Date(substr(x, 1, 10))
}
assert_cols <- function(df, cols, name) {
missing <- setdiff(cols, names(df))
if (length(missing) > 0) {
stop(sprintf("Missing required columns in %s: %s", name, paste(missing, collapse = ", ")), call. = FALSE)
}
}
stop_if_dups <- function(df, key, out_csv) {
tab <- df |>
dplyr::count(dplyr::across(dplyr::all_of(key))) |>
dplyr::filter(.data$n > 1)
if (nrow(tab) > 0) {
readr::write_csv(tab, out_csv)
stop(sprintf("Duplicate key(s) detected by %s. See: %s", paste(key, collapse = ", "), out_csv), call. = FALSE)
}
}
dm <- readr::read_csv(dm_path, show_col_types = FALSE) |>
dplyr::filter(.data$STUDYID == study_id)
ex <- readr::read_csv(ex_path, show_col_types = FALSE) |>
dplyr::filter(.data$STUDYID == study_id)
assert_cols(dm, c("STUDYID","USUBJID","RFSTDTC","AGE","AGEU","SEX","RACE","ARM"), "DM")
assert_cols(ex, c("STUDYID","USUBJID","EXTRT","EXSTDTC","EXENDTC"), "EX")
dm0 <- dm |>
dplyr::select(STUDYID, USUBJID, SITEID, SUBJID, RFSTDTC, RFENDTC, BRTHDTC, AGE, AGEU, SEX, RACE, ARM)
stop_if_dups(dm0, "USUBJID", file.path(paths$qc_dir, "dups_dm0.csv"))
ex_summ <- ex |>
dplyr::arrange(.data$USUBJID, .data$EXSTDTC) |>
dplyr::group_by(.data$USUBJID) |>
dplyr::summarise(
TRT01P = dplyr::first(.data$EXTRT),
TRTSDT = iso_to_date(dplyr::first(.data$EXSTDTC)),
TRTEDT = {
end_nonmissing <- .data$EXENDTC[!is.na(.data$EXENDTC)]
if (length(end_nonmissing) == 0) as.Date(NA) else iso_to_date(end_nonmissing[length(end_nonmissing)])
},
.groups = "drop"
)
stop_if_dups(ex_summ, "USUBJID", file.path(paths$qc_dir, "dups_ex_summ.csv"))
adsl <- dm0 |>
dplyr::left_join(ex_summ, by = "USUBJID") |>
dplyr::mutate(
RANDDT = iso_to_date(.data$RFSTDTC),
SAFFL = dplyr::if_else(!is.na(.data$TRTSDT), "Y", "N"),
TRT01A = .data$TRT01P
) |>
dplyr::select(
STUDYID, USUBJID, SITEID, SUBJID, ARM,
RANDDT, TRT01P, TRT01A, TRTSDT, TRTEDT, SAFFL,
AGE, AGEU, SEX, RACE
)
# Hard-stop integrity checks
if (any(is.na(adsl$USUBJID) | adsl$USUBJID == "")) stop("Missing USUBJID in ADSL.", call. = FALSE)
stop_if_dups(adsl, "USUBJID", file.path(paths$qc_dir, "dups_adsl.csv"))
if (any(adsl$SAFFL == "Y" & is.na(adsl$TRT01A))) stop("SAFFL=Y but TRT01A missing.", call. = FALSE)
# Write output (deterministic)
readr::write_csv(adsl, adsl_path)
# Build summary artifact
summary <- adsl |>
dplyr::summarise(
n_records = dplyr::n(),
n_saffl = sum(.data$SAFFL == "Y"),
n_missing_usubjid = sum(is.na(.data$USUBJID) | .data$USUBJID == "")
)
readr::write_csv(summary, file.path(paths$qc_dir, "adsl_build_summary_r.csv"))What this demonstrates (industry best practice)
- Traceability: explicit derivations from SDTM-like sources to ADaM variables.
- Integrity: hard-stop checks for keys, duplicates, and internal consistency.
- Audit readiness: run artifacts (summaries, manifests) are produced every run.
- Reproducibility: deterministic outputs and parameterized paths support repeatable execution.
6 Section C — Independent QC (dual programming pattern)
6.1 Why QC independence matters
A submission-grade practice is that QC:
- is independent (separate code path)
- reconstructs critical variables
- compares to production output using a deterministic comparison step
- writes a QC report artifact
6.2 SAS vs R (scroll to see full code)
/*=============================================================================
Program: 02_qc_adsl.sas
Purpose: Independent QC build of ADSL and compare to production ADSL.
Inputs: sdtm.dm, sdtm.ex, adam.adsl
Outputs: qc.adsl_qc, comparison report CSV
QC Approach: Re-derive key fields via independent logic, then PROC COMPARE.
=============================================================================*/
%macro assert_exist(ds);
%if not %sysfunc(exist(&ds)) %then %do;
%put ERROR: Missing required dataset: &ds;
%abort cancel;
%end;
%mend;
%assert_exist(adam.adsl);
/* Independent rebuild (intentionally coded differently than production) */
proc sql;
create table qc.dm_qc as
select
STUDYID, USUBJID, SITEID, SUBJID, RFSTDTC, RFENDTC, BRTHDTC, AGE, AGEU, SEX, RACE, ARM
from sdtm.dm
where STUDYID="&STUDYID";
quit;
proc sql;
create table qc.ex_qc as
select STUDYID, USUBJID, EXTRT, EXSTDTC, EXENDTC
from sdtm.ex
where STUDYID="&STUDYID";
quit;
proc sort data=qc.ex_qc; by USUBJID EXSTDTC; run;
/* Different method: use PROC SQL aggregation for first/last */
proc sql;
create table qc.ex_summ_qc as
select
USUBJID,
min(input(substr(EXSTDTC,1,10), yymmdd10.)) as TRTSDT format=yymmdd10.,
max(input(substr(EXENDTC,1,10), yymmdd10.)) as TRTEDT format=yymmdd10.,
/* planned trt = first by EXSTDTC */
(select EXTRT from qc.ex_qc b
where b.USUBJID=a.USUBJID
order by b.EXSTDTC
fetch first 1 rows only) as TRT01P length=20
from qc.ex_qc a
group by USUBJID;
quit;
proc sort data=qc.dm_qc; by USUBJID; run;
proc sort data=qc.ex_summ_qc; by USUBJID; run;
data qc.adsl_qc;
merge qc.dm_qc(in=a) qc.ex_summ_qc(in=b);
by USUBJID;
length TRT01A $20 SAFFL $1;
format RANDDT TRTSDT TRTEDT yymmdd10.;
if not a then delete;
RANDDT = input(substr(RFSTDTC,1,10), yymmdd10.);
SAFFL = ifc(b and not missing(TRTSDT), "Y", "N");
TRT01A = TRT01P;
keep STUDYID USUBJID SITEID SUBJID ARM RANDDT TRT01P TRT01A TRTSDT TRTEDT
SAFFL AGE AGEU SEX RACE;
run;
/* Compare QC vs PROD */
proc sort data=adam.adsl; by USUBJID; run;
proc sort data=qc.adsl_qc; by USUBJID; run;
proc compare base=adam.adsl compare=qc.adsl_qc out=qc.adsl_compare_out outnoequal noprint;
id USUBJID;
run;
/* Create a compact QC summary artifact */
proc sql;
create table qc.adsl_qc_summary as
select
(select count(*) from adam.adsl) as n_prod,
(select count(*) from qc.adsl_qc) as n_qc,
(select count(*) from qc.adsl_compare_out) as n_differences
from dictionary.tables
where libname='QC' and memname='ADSL_QC_SUMMARY';
quit;
proc export data=qc.adsl_qc_summary
outfile="&QC_DIR/adsl_qc_summary.csv"
dbms=csv replace;
run;
# =============================================================================
# Program: 02_qc_adsl.R
# Purpose: Independent QC build of ADSL and compare to production ADSL.
# Inputs: dm.csv, ex.csv, adsl.csv
# Outputs: qc_adsl.csv, qc_compare.csv, qc_summary.csv
# QC: Re-derive using different approach; compare row/col equality.
# =============================================================================
iso_to_date <- function(x) as.Date(substr(x, 1, 10))
adsl_prod <- readr::read_csv(paths$adam_adsl, show_col_types = FALSE)
dm_qc <- readr::read_csv(paths$sdtm_dm, show_col_types = FALSE) |>
dplyr::filter(.data$STUDYID == study_id) |>
dplyr::select(STUDYID, USUBJID, SITEID, SUBJID, RFSTDTC, RFENDTC, BRTHDTC,
AGE, AGEU, SEX, RACE, ARM)
ex_qc <- readr::read_csv(paths$sdtm_ex, show_col_types = FALSE) |>
dplyr::filter(.data$STUDYID == study_id) |>
dplyr::select(STUDYID, USUBJID, EXTRT, EXSTDTC, EXENDTC)
# Different approach than production: aggregate min/max and pick first
treatment by earliest EXSTDTC
ex_summ_qc <- ex_qc |>
dplyr::arrange(.data$USUBJID, .data$EXSTDTC) |>
dplyr::group_by(.data$USUBJID) |>
dplyr::summarise(
TRTSDT = min(iso_to_date(.data$EXSTDTC), na.rm = TRUE),
TRTEDT = {
end_dates <- .data$EXENDTC[!is.na(.data$EXENDTC)]
if (length(end_dates) == 0) as.Date(NA) else max(iso_to_date(end_dates), na.rm = TRUE)
},
TRT01P = dplyr::first(.data$EXTRT),
.groups = "drop"
) |>
dplyr::mutate(
TRTSDT = ifelse(is.infinite(TRTSDT), as.Date(NA), as.Date(TRTSDT))
)
adsl_qc <- dm_qc |>
dplyr::left_join(ex_summ_qc, by = "USUBJID") |>
dplyr::mutate(
RANDDT = iso_to_date(.data$RFSTDTC),
SAFFL = dplyr::if_else(!is.na(.data$TRTSDT), "Y", "N"),
TRT01A = .data$TRT01P
) |>
dplyr::select(
STUDYID, USUBJID, SITEID, SUBJID, ARM,
RANDDT, TRT01P, TRT01A, TRTSDT, TRTEDT, SAFFL,
AGE, AGEU, SEX, RACE
) |>
dplyr::arrange(.data$USUBJID)
adsl_prod2 <- adsl_prod |>
dplyr::arrange(.data$USUBJID)
# Compare: full join and flag diffs at cell level (compact)
cmp <- adsl_prod2 |>
dplyr::full_join(adsl_qc, by = "USUBJID", suffix = c("_PROD", "_QC"))
# Identify differences for key derived variables (expandable)
key_vars <- c("RANDDT", "TRT01P", "TRT01A", "TRTSDT", "TRTEDT", "SAFFL")
diff_rows <- cmp |>
dplyr::mutate(
diff_any = FALSE
)
for (v in key_vars) {
prod <- paste0(v, "_PROD")
qc <- paste0(v, "_QC")
if (!(prod %in% names(diff_rows) && qc %in% names(diff_rows))) next
diff_rows[[paste0("DIFF_", v)]] <- !(isTRUE(all.equal(diff_rows[[prod]], diff_rows[[qc]])) ) # global fallback
}
# Better: row-wise diff flags
row_diff <- cmp |>
dplyr::mutate(
DIFF_RANDDT = .data$RANDDT_PROD != .data$RANDDT_QC,
DIFF_TRT01P = .data$TRT01P_PROD != .data$TRT01P_QC,
DIFF_TRT01A = .data$TRT01A_PROD != .data$TRT01A_QC,
DIFF_TRTSDT = .data$TRTSDT_PROD != .data$TRTSDT_QC,
DIFF_TRTEDT = .data$TRTEDT_PROD != .data$TRTEDT_QC,
DIFF_SAFFL = .data$SAFFL_PROD != .data$SAFFL_QC
) |>
dplyr::mutate(
DIFF_ANY = dplyr::if_any(dplyr::starts_with("DIFF_"), ~ isTRUE(.x))
) |>
dplyr::filter(.data$DIFF_ANY) |>
dplyr::select(USUBJID, dplyr::starts_with("DIFF_"),
dplyr::ends_with("_PROD"), dplyr::ends_with("_QC"))
readr::write_csv(adsl_qc, file.path(paths$qc_dir, "adsl_qc_r.csv"))
readr::write_csv(row_diff, file.path(paths$qc_dir, "adsl_compare_r.csv"))
qc_summary <- data.frame(
n_prod = nrow(adsl_prod2),
n_qc = nrow(adsl_qc),
n_diff = nrow(row_diff),
stringsAsFactors = FALSE
)
readr::write_csv(qc_summary, file.path(paths$qc_dir, "adsl_qc_summary_r.csv"))What this demonstrates (FDA reviewer lens)
- QC independence: production and QC derivations are coded differently, reducing shared-mode failure risk.
- Deterministic comparison: explicit compare outputs make discrepancies reviewable and auditable.
- Run artifacts: QC summaries and difference listings are created as persistent records of the QC process.
7 Section D — “Submission-grade” completeness checklist
This example includes the core elements that distinguish “regulated” code from ordinary analytics.
- Controlled entrypoint (driver)
- Parameterized study ID and paths
- Input dataset and variable existence checks
- Deterministic dataset build logic
- Hard-stop integrity checks (keys, duplicates, consistency)
- Independent QC build with a different derivation approach
- Compare outputs and persistent QC artifacts (CSV summaries)