Submission-Grade Statistical Programming Example (SAS + R)

1 Purpose and scope

This document provides a submission-grade programming example demonstrating how regulated intent becomes executable code. It includes:

  • ADaM ADSL build from SDTM-like inputs (DM, EX)
  • Independent QC build and comparison
  • Run artifacts supporting audit readiness: manifest, checks, and deterministic outputs
  • SAS and R implementations with aligned logic and best-practice commentary

Important scope note

This example focuses on: (1) dataset derivation rigor, (2) audit-friendly execution, and (3) QC independence. In real submission packages, additional study-specific artifacts exist (SAP, dataset specs, Define-XML, SDRG, controlled terminology libraries). Those items are intentionally not fully generated here to keep the example readable and website-friendly.


3 Example dataset logic (ADSL)

This example derives a minimal but realistic ADSL:

  • Keys: STUDYID, USUBJID
  • Dates: RANDDT, TRTSDT, TRTEDT
  • Treatment: TRT01P, TRT01A
  • Population: SAFFL (any exposure)

Assumptions (declared in code):

  • SDTM-like inputs have ISO 8601 date strings (YYYY-MM-DD...)
  • One primary exposure treatment per subject (simplified)

4 Section A — Run Driver (controlled execution)

4.1 Why this matters (best practice)

A “submission-grade” pipeline has a driver that:

  • centralizes parameters (study ID, paths, run ID)
  • enforces input existence and write locations
  • produces a run manifest (what ran, when, with what inputs)
  • stops on failure (no silent partial runs)

4.2 SAS vs R (scroll to see full code)

SAS — 00_run_driver.sas
/*=============================================================================
Program:     00_run_driver.sas
Purpose:     Controlled execution driver (submission-grade pattern).
Author:      Jonathan D. Stallings, PhD, MS
Notes:       Centralizes parameters, sets paths, runs build + QC, writes manifest.
=============================================================================*/

options nodate nonumber mprint mlogic symbolgen validvarname=upcase missing=' ';

%let STUDYID   = STUDY-XYZ;
%let ROOT      = /path/to/project;
%let SDTM_DIR  = &ROOT/data/sdtm;
%let ADAM_DIR  = &ROOT/data/adam;
%let OUT_DIR   = &ROOT/outputs;
%let LOG_DIR   = &OUT_DIR/logs;
%let MAN_DIR   = &OUT_DIR/manifests;
%let QC_DIR    = &OUT_DIR/qc;

%let RUN_DTTM  = %sysfunc(datetime(), e8601dt.);
%let RUN_ID    = %sysfunc(compress(&RUN_DTTM, :-T));

/* Ensure output folders exist (OS-dependent; keep simple for example) */
options dlcreatedir;
libname _tmp "&OUT_DIR/_tmp";
libname _tmp clear;
options nodlcreatedir;

libname sdtm "&SDTM_DIR";
libname adam "&ADAM_DIR";
libname out  "&OUT_DIR";
libname qc   "&QC_DIR";

%macro assert_exist(ds);
  %if not %sysfunc(exist(&ds)) %then %do;
    %put ERROR: Missing required dataset: &ds;
    %abort cancel;
  %end;
%mend;

%assert_exist(sdtm.dm);
%assert_exist(sdtm.ex);

/* Run build then QC */
%include "&ROOT/programs/sas/01_build_adsl.sas";
%include "&ROOT/programs/sas/02_qc_adsl.sas";

/* Minimal run manifest */
data out.run_manifest_sas;
  length RUN_ID $40 RUN_DTTM $30 STUDYID $40 SDTM_DIR ADAM_DIR OUT_DIR $200;
  RUN_ID="&RUN_ID";
  RUN_DTTM="&RUN_DTTM";
  STUDYID="&STUDYID";
  SDTM_DIR="&SDTM_DIR";
  ADAM_DIR="&ADAM_DIR";
  OUT_DIR="&OUT_DIR";
run;

proc export data=out.run_manifest_sas
  outfile="&MAN_DIR/run_manifest_sas_&RUN_ID..csv"
  dbms=csv replace;
run;

R — 00_run_driver.R
# =============================================================================
# Program:  00_run_driver.R
# Purpose:  Controlled execution driver (submission-grade pattern).
# Author:   Jonathan D. Stallings, PhD, MS
# Notes:    Centralizes parameters, runs build + QC, writes manifest + session.
# =============================================================================

study_id <- "STUDY-XYZ"

root <- "path/to/project"
paths <- list(
  sdtm_dm = file.path(root, "data/sdtm/dm.csv"),
  sdtm_ex = file.path(root, "data/sdtm/ex.csv"),
  adam_adsl = file.path(root, "data/adam/adsl.csv"),
  out_dir = file.path(root, "outputs"),
  log_dir = file.path(root, "outputs/logs"),
  man_dir = file.path(root, "outputs/manifests"),
  qc_dir  = file.path(root, "outputs/qc")
)

dir.create(paths$out_dir, recursive = TRUE, showWarnings = FALSE)
dir.create(paths$log_dir, recursive = TRUE, showWarnings = FALSE)
dir.create(paths$man_dir, recursive = TRUE, showWarnings = FALSE)
dir.create(paths$qc_dir,  recursive = TRUE, showWarnings = FALSE)

stop_if_missing <- function(p) {
  if (!file.exists(p)) stop(sprintf("Missing required input: %s", p), call. = FALSE)
}

stop_if_missing(paths$sdtm_dm)
stop_if_missing(paths$sdtm_ex)

run_dttm_utc <- format(Sys.time(), tz = "UTC", usetz = TRUE)
run_id <- gsub("[^0-9]", "", run_dttm_utc)

# Source build + QC scripts
source(file.path(root, "programs/r/01_build_adsl.R"))
source(file.path(root, "programs/r/02_qc_adsl.R"))

# Write run manifest + session info
manifest <- data.frame(
  run_id = run_id,
  run_dttm_utc = run_dttm_utc,
  study_id = study_id,
  sdtm_dm = paths$sdtm_dm,
  sdtm_ex = paths$sdtm_ex,
  adam_adsl = paths$adam_adsl,
  stringsAsFactors = FALSE
)

write.csv(manifest, file.path(paths$man_dir, paste0("run_manifest_r_", run_id, ".csv")), row.names = FALSE)
capture.output(sessionInfo(), file = file.path(paths$man_dir, paste0("sessionInfo_", run_id, ".txt")))

What this demonstrates (FDA reviewer lens)

  • Controlled execution: a single entry point with explicit parameters reduces ambiguity.
  • Deterministic runs: consistent inputs/outputs with run identifiers support traceability.
  • Audit trail artifacts: manifests and logs make it easy to reconstruct what happened.


5 Section B — Build ADSL (derivation program)

5.1 Best-practice principles applied

  • Single responsibility: one build program for ADSL
  • Declared assumptions: date parsing, exposure summarization rules
  • Explicit variable derivations with labels (SAS) and consistent naming (R)
  • Integrity checks that stop the run on critical failures (duplicates, missing keys)

5.2 SAS vs R (scroll to see full code)

SAS — 01_build_adsl.sas
/*=============================================================================
Program:     01_build_adsl.sas
Purpose:     Build ADaM ADSL from SDTM DM and EX (submission-grade pattern).
Inputs:      sdtm.dm, sdtm.ex
Output:      adam.adsl
Assumptions: ISO 8601 dates; 1 primary treatment per subject (simplified).
=============================================================================*/

%macro stop_if_dups(ds, key);
  proc sort data=&ds out=_chk nodupkey dupout=_dups; by &key; run;
  %local ndup;
  proc sql noprint; select count(*) into :ndup trimmed from _dups; quit;
  %if &ndup > 0 %then %do;
    %put ERROR: Duplicate key(s) detected in &ds by &key..;
    proc export data=_dups outfile="&QC_DIR/dups_%scan(&ds,2,.)..csv" dbms=csv replace; run;
    %abort cancel;
  %end;
%mend;

%macro assert_vars(ds, varlist);
  %local i v;
  %do i=1 %to %sysfunc(countw(&varlist));
    %let v = %scan(&varlist,&i);
    proc sql noprint;
      select count(*) into :_vexists trimmed
      from dictionary.columns
      where libname=%upcase("%scan(&ds,1,.)")
        and memname=%upcase("%scan(&ds,2,.)")
        and name=%upcase("&v");
    quit;
    %if &_vexists = 0 %then %do;
      %put ERROR: Variable &v not found in &ds;
      %abort cancel;
    %end;
  %end;
%mend;

/* Validate required variables exist */
%assert_vars(sdtm.dm, STUDYID USUBJID RFSTDTC AGE AGEU SEX RACE ARM);
%assert_vars(sdtm.ex, STUDYID USUBJID EXTRT EXSTDTC EXENDTC);

/* Step 1: DM subset */
data work.dm0;
  set sdtm.dm;
  where STUDYID="&STUDYID";
  keep STUDYID USUBJID SITEID SUBJID RFSTDTC RFENDTC BRTHDTC AGE AGEU SEX RACE ARM;
run;

%stop_if_dups(work.dm0, USUBJID);

/* Step 2: EX exposure summary */
data work.ex0;
  set sdtm.ex;
  where STUDYID="&STUDYID";
  keep STUDYID USUBJID EXTRT EXSTDTC EXENDTC;
run;

proc sort data=work.ex0; by USUBJID EXSTDTC; run;

data work.ex_summ;
  set work.ex0;
  by USUBJID;
  length TRT01P $20;
  retain TRT01P TRTSDT TRTEDT;
  format TRTSDT TRTEDT yymmdd10.;

  if first.USUBJID then do;
    TRT01P = strip(EXTRT);
    TRTSDT = input(substr(EXSTDTC,1,10), yymmdd10.);
    TRTEDT = .;
  end;

  /* last non-missing end date */
  if not missing(EXENDTC) then TRTEDT = input(substr(EXENDTC,1,10), yymmdd10.);

  if last.USUBJID then output;
  keep USUBJID TRT01P TRTSDT TRTEDT;
run;

%stop_if_dups(work.ex_summ, USUBJID);

/* Step 3: Build ADSL */
proc sort data=work.dm0; by USUBJID; run;
proc sort data=work.ex_summ; by USUBJID; run;

data adam.adsl(label="Subject-Level Analysis Dataset (ADSL)");
  merge work.dm0(in=a) work.ex_summ(in=b);
  by USUBJID;

  length TRT01A $20 SAFFL $1;
  format RANDDT TRTSDT TRTEDT yymmdd10.;

  if not a then delete;

  RANDDT = input(substr(RFSTDTC,1,10), yymmdd10.);
  SAFFL  = ifc(b and not missing(TRTSDT), "Y", "N");
  TRT01A = TRT01P;

  label
    RANDDT = "Date of Randomization/Reference Start Date"
    TRT01P = "Planned Treatment for Period 01"
    TRT01A = "Actual Treatment for Period 01"
    TRTSDT = "Treatment Start Date"
    TRTEDT = "Treatment End Date"
    SAFFL  = "Safety Population Flag"
  ;

  /* Hard-stop integrity checks */
  if missing(USUBJID) then do;
    put "ERROR: Missing USUBJID in ADSL";
    abort cancel;
  end;

  if SAFFL="Y" and missing(TRT01A) then do;
    put "ERROR: SAFFL=Y but TRT01A missing for " USUBJID=;
    abort cancel;
  end;
run;

%stop_if_dups(adam.adsl, USUBJID);

/* Produce a small build summary artifact */
proc sql;
  create table qc.adsl_build_summary as
  select
    count(*) as n_records,
    sum(SAFFL="Y") as n_saffl,
    sum(missing(USUBJID)) as n_missing_usubjid
  from adam.adsl;
quit;

proc export data=qc.adsl_build_summary
  outfile="&QC_DIR/adsl_build_summary.csv"
  dbms=csv replace;
run;

R — 01_build_adsl.R
# =============================================================================
# Program:  01_build_adsl.R
# Purpose:  Build ADaM ADSL from SDTM-like DM/EX (submission-grade pattern).
# Inputs:   dm.csv, ex.csv
# Output:   adsl.csv
# Assump:   ISO 8601 dates; 1 primary treatment per subject (simplified).
# =============================================================================

# Explicit namespaces (portable across environments)
dm_path <- paths$sdtm_dm
ex_path <- paths$sdtm_ex
adsl_path <- paths$adam_adsl

iso_to_date <- function(x) {
  if (all(is.na(x))) return(as.Date(rep(NA, length(x))))
  as.Date(substr(x, 1, 10))
}

assert_cols <- function(df, cols, name) {
  missing <- setdiff(cols, names(df))
  if (length(missing) > 0) {
    stop(sprintf("Missing required columns in %s: %s", name, paste(missing, collapse = ", ")), call. = FALSE)
  }
}

stop_if_dups <- function(df, key, out_csv) {
  tab <- df |>
    dplyr::count(dplyr::across(dplyr::all_of(key))) |>
    dplyr::filter(.data$n > 1)

  if (nrow(tab) > 0) {
    readr::write_csv(tab, out_csv)
    stop(sprintf("Duplicate key(s) detected by %s. See: %s", paste(key, collapse = ", "), out_csv), call. = FALSE)
  }
}

dm <- readr::read_csv(dm_path, show_col_types = FALSE) |>
  dplyr::filter(.data$STUDYID == study_id)

ex <- readr::read_csv(ex_path, show_col_types = FALSE) |>
  dplyr::filter(.data$STUDYID == study_id)

assert_cols(dm, c("STUDYID","USUBJID","RFSTDTC","AGE","AGEU","SEX","RACE","ARM"), "DM")
assert_cols(ex, c("STUDYID","USUBJID","EXTRT","EXSTDTC","EXENDTC"), "EX")

dm0 <- dm |>
  dplyr::select(STUDYID, USUBJID, SITEID, SUBJID, RFSTDTC, RFENDTC, BRTHDTC, AGE, AGEU, SEX, RACE, ARM)

stop_if_dups(dm0, "USUBJID", file.path(paths$qc_dir, "dups_dm0.csv"))

ex_summ <- ex |>
  dplyr::arrange(.data$USUBJID, .data$EXSTDTC) |>
  dplyr::group_by(.data$USUBJID) |>
  dplyr::summarise(
    TRT01P = dplyr::first(.data$EXTRT),
    TRTSDT = iso_to_date(dplyr::first(.data$EXSTDTC)),
    TRTEDT = {
      end_nonmissing <- .data$EXENDTC[!is.na(.data$EXENDTC)]
      if (length(end_nonmissing) == 0) as.Date(NA) else iso_to_date(end_nonmissing[length(end_nonmissing)])
    },
    .groups = "drop"
  )

stop_if_dups(ex_summ, "USUBJID", file.path(paths$qc_dir, "dups_ex_summ.csv"))

adsl <- dm0 |>
  dplyr::left_join(ex_summ, by = "USUBJID") |>
  dplyr::mutate(
    RANDDT = iso_to_date(.data$RFSTDTC),
    SAFFL  = dplyr::if_else(!is.na(.data$TRTSDT), "Y", "N"),
    TRT01A = .data$TRT01P
  ) |>
  dplyr::select(
    STUDYID, USUBJID, SITEID, SUBJID, ARM,
    RANDDT, TRT01P, TRT01A, TRTSDT, TRTEDT, SAFFL,
    AGE, AGEU, SEX, RACE
  )

# Hard-stop integrity checks
if (any(is.na(adsl$USUBJID) | adsl$USUBJID == "")) stop("Missing USUBJID in ADSL.", call. = FALSE)
stop_if_dups(adsl, "USUBJID", file.path(paths$qc_dir, "dups_adsl.csv"))
if (any(adsl$SAFFL == "Y" & is.na(adsl$TRT01A))) stop("SAFFL=Y but TRT01A missing.", call. = FALSE)

# Write output (deterministic)
readr::write_csv(adsl, adsl_path)

# Build summary artifact
summary <- adsl |>
  dplyr::summarise(
    n_records = dplyr::n(),
    n_saffl   = sum(.data$SAFFL == "Y"),
    n_missing_usubjid = sum(is.na(.data$USUBJID) | .data$USUBJID == "")
  )

readr::write_csv(summary, file.path(paths$qc_dir, "adsl_build_summary_r.csv"))

What this demonstrates (industry best practice)

  • Traceability: explicit derivations from SDTM-like sources to ADaM variables.
  • Integrity: hard-stop checks for keys, duplicates, and internal consistency.
  • Audit readiness: run artifacts (summaries, manifests) are produced every run.
  • Reproducibility: deterministic outputs and parameterized paths support repeatable execution.


6 Section C — Independent QC (dual programming pattern)

6.1 Why QC independence matters

A submission-grade practice is that QC:

  • is independent (separate code path)
  • reconstructs critical variables
  • compares to production output using a deterministic comparison step
  • writes a QC report artifact

6.2 SAS vs R (scroll to see full code)

SAS — 02_qc_adsl.sas
/*=============================================================================
Program:     02_qc_adsl.sas
Purpose:     Independent QC build of ADSL and compare to production ADSL.
Inputs:      sdtm.dm, sdtm.ex, adam.adsl
Outputs:     qc.adsl_qc, comparison report CSV
QC Approach: Re-derive key fields via independent logic, then PROC COMPARE.
=============================================================================*/

%macro assert_exist(ds);
  %if not %sysfunc(exist(&ds)) %then %do;
    %put ERROR: Missing required dataset: &ds;
    %abort cancel;
  %end;
%mend;

%assert_exist(adam.adsl);

/* Independent rebuild (intentionally coded differently than production) */
proc sql;
  create table qc.dm_qc as
  select
    STUDYID, USUBJID, SITEID, SUBJID, RFSTDTC, RFENDTC, BRTHDTC, AGE, AGEU, SEX, RACE, ARM
  from sdtm.dm
  where STUDYID="&STUDYID";
quit;

proc sql;
  create table qc.ex_qc as
  select STUDYID, USUBJID, EXTRT, EXSTDTC, EXENDTC
  from sdtm.ex
  where STUDYID="&STUDYID";
quit;

proc sort data=qc.ex_qc; by USUBJID EXSTDTC; run;

/* Different method: use PROC SQL aggregation for first/last */
proc sql;
  create table qc.ex_summ_qc as
  select
    USUBJID,
    min(input(substr(EXSTDTC,1,10), yymmdd10.)) as TRTSDT format=yymmdd10.,
    max(input(substr(EXENDTC,1,10), yymmdd10.)) as TRTEDT format=yymmdd10.,
    /* planned trt = first by EXSTDTC */
    (select EXTRT from qc.ex_qc b
      where b.USUBJID=a.USUBJID
      order by b.EXSTDTC
      fetch first 1 rows only) as TRT01P length=20
  from qc.ex_qc a
  group by USUBJID;
quit;

proc sort data=qc.dm_qc;      by USUBJID; run;
proc sort data=qc.ex_summ_qc; by USUBJID; run;

data qc.adsl_qc;
  merge qc.dm_qc(in=a) qc.ex_summ_qc(in=b);
  by USUBJID;
  length TRT01A $20 SAFFL $1;
  format RANDDT TRTSDT TRTEDT yymmdd10.;
  if not a then delete;

  RANDDT = input(substr(RFSTDTC,1,10), yymmdd10.);
  SAFFL  = ifc(b and not missing(TRTSDT), "Y", "N");
  TRT01A = TRT01P;

  keep STUDYID USUBJID SITEID SUBJID ARM RANDDT TRT01P TRT01A TRTSDT TRTEDT 
SAFFL AGE AGEU SEX RACE;
run;

/* Compare QC vs PROD */
proc sort data=adam.adsl;  by USUBJID; run;
proc sort data=qc.adsl_qc; by USUBJID; run;

proc compare base=adam.adsl compare=qc.adsl_qc out=qc.adsl_compare_out outnoequal noprint;
  id USUBJID;
run;

/* Create a compact QC summary artifact */
proc sql;
  create table qc.adsl_qc_summary as
  select
    (select count(*) from adam.adsl) as n_prod,
    (select count(*) from qc.adsl_qc) as n_qc,
    (select count(*) from qc.adsl_compare_out) as n_differences
  from dictionary.tables
  where libname='QC' and memname='ADSL_QC_SUMMARY';
quit;

proc export data=qc.adsl_qc_summary
  outfile="&QC_DIR/adsl_qc_summary.csv"
  dbms=csv replace;
run;

R — 02_qc_adsl.R
# =============================================================================
# Program:  02_qc_adsl.R
# Purpose:  Independent QC build of ADSL and compare to production ADSL.
# Inputs:   dm.csv, ex.csv, adsl.csv
# Outputs:  qc_adsl.csv, qc_compare.csv, qc_summary.csv
# QC:       Re-derive using different approach; compare row/col equality.
# =============================================================================

iso_to_date <- function(x) as.Date(substr(x, 1, 10))

adsl_prod <- readr::read_csv(paths$adam_adsl, show_col_types = FALSE)

dm_qc <- readr::read_csv(paths$sdtm_dm, show_col_types = FALSE) |>
  dplyr::filter(.data$STUDYID == study_id) |>
  dplyr::select(STUDYID, USUBJID, SITEID, SUBJID, RFSTDTC, RFENDTC, BRTHDTC, 
  AGE, AGEU, SEX, RACE, ARM)

ex_qc <- readr::read_csv(paths$sdtm_ex, show_col_types = FALSE) |>
  dplyr::filter(.data$STUDYID == study_id) |>
  dplyr::select(STUDYID, USUBJID, EXTRT, EXSTDTC, EXENDTC)

# Different approach than production: aggregate min/max and pick first 
treatment by earliest EXSTDTC
ex_summ_qc <- ex_qc |>
  dplyr::arrange(.data$USUBJID, .data$EXSTDTC) |>
  dplyr::group_by(.data$USUBJID) |>
  dplyr::summarise(
    TRTSDT = min(iso_to_date(.data$EXSTDTC), na.rm = TRUE),
    TRTEDT = {
      end_dates <- .data$EXENDTC[!is.na(.data$EXENDTC)]
      if (length(end_dates) == 0) as.Date(NA) else max(iso_to_date(end_dates), na.rm = TRUE)
    },
    TRT01P = dplyr::first(.data$EXTRT),
    .groups = "drop"
  ) |>
  dplyr::mutate(
    TRTSDT = ifelse(is.infinite(TRTSDT), as.Date(NA), as.Date(TRTSDT))
  )

adsl_qc <- dm_qc |>
  dplyr::left_join(ex_summ_qc, by = "USUBJID") |>
  dplyr::mutate(
    RANDDT = iso_to_date(.data$RFSTDTC),
    SAFFL  = dplyr::if_else(!is.na(.data$TRTSDT), "Y", "N"),
    TRT01A = .data$TRT01P
  ) |>
  dplyr::select(
    STUDYID, USUBJID, SITEID, SUBJID, ARM,
    RANDDT, TRT01P, TRT01A, TRTSDT, TRTEDT, SAFFL,
    AGE, AGEU, SEX, RACE
  ) |>
  dplyr::arrange(.data$USUBJID)

adsl_prod2 <- adsl_prod |>
  dplyr::arrange(.data$USUBJID)

# Compare: full join and flag diffs at cell level (compact)
cmp <- adsl_prod2 |>
  dplyr::full_join(adsl_qc, by = "USUBJID", suffix = c("_PROD", "_QC"))

# Identify differences for key derived variables (expandable)
key_vars <- c("RANDDT", "TRT01P", "TRT01A", "TRTSDT", "TRTEDT", "SAFFL")

diff_rows <- cmp |>
  dplyr::mutate(
    diff_any = FALSE
  )

for (v in key_vars) {
  prod <- paste0(v, "_PROD")
  qc   <- paste0(v, "_QC")
  if (!(prod %in% names(diff_rows) && qc %in% names(diff_rows))) next
  diff_rows[[paste0("DIFF_", v)]] <- !(isTRUE(all.equal(diff_rows[[prod]], diff_rows[[qc]])) ) # global fallback
}

# Better: row-wise diff flags
row_diff <- cmp |>
  dplyr::mutate(
    DIFF_RANDDT = .data$RANDDT_PROD != .data$RANDDT_QC,
    DIFF_TRT01P = .data$TRT01P_PROD != .data$TRT01P_QC,
    DIFF_TRT01A = .data$TRT01A_PROD != .data$TRT01A_QC,
    DIFF_TRTSDT = .data$TRTSDT_PROD != .data$TRTSDT_QC,
    DIFF_TRTEDT = .data$TRTEDT_PROD != .data$TRTEDT_QC,
    DIFF_SAFFL  = .data$SAFFL_PROD  != .data$SAFFL_QC
  ) |>
  dplyr::mutate(
    DIFF_ANY = dplyr::if_any(dplyr::starts_with("DIFF_"), ~ isTRUE(.x))
  ) |>
  dplyr::filter(.data$DIFF_ANY) |>
  dplyr::select(USUBJID, dplyr::starts_with("DIFF_"), 
  dplyr::ends_with("_PROD"), dplyr::ends_with("_QC"))

readr::write_csv(adsl_qc, file.path(paths$qc_dir, "adsl_qc_r.csv"))
readr::write_csv(row_diff, file.path(paths$qc_dir, "adsl_compare_r.csv"))

qc_summary <- data.frame(
  n_prod = nrow(adsl_prod2),
  n_qc = nrow(adsl_qc),
  n_diff = nrow(row_diff),
  stringsAsFactors = FALSE
)

readr::write_csv(qc_summary, file.path(paths$qc_dir, "adsl_qc_summary_r.csv"))

What this demonstrates (FDA reviewer lens)

  • QC independence: production and QC derivations are coded differently, reducing shared-mode failure risk.
  • Deterministic comparison: explicit compare outputs make discrepancies reviewable and auditable.
  • Run artifacts: QC summaries and difference listings are created as persistent records of the QC process.


7 Section D — “Submission-grade” completeness checklist

This example includes the core elements that distinguish “regulated” code from ordinary analytics.

  • Controlled entrypoint (driver)
  • Parameterized study ID and paths
  • Input dataset and variable existence checks
  • Deterministic dataset build logic
  • Hard-stop integrity checks (keys, duplicates, consistency)
  • Independent QC build with a different derivation approach
  • Compare outputs and persistent QC artifacts (CSV summaries)