Introduction to the Kidney Dataset NanoStringGeomxSet Object

Nicole Ortogero

2021-02-26

Introduction

R data objects of the GeoMxSet class are provided with corresponding expression data tables pre-loaded for user convenience. The GeoMxSet class is an S4 data structure in current development at NanoString. This vignette gives a short introduction on how to work with the object as well as quick look at analysis on the object. Please refer to the readme and introduction documents and presentations for information about the kidney dataset and how this data was collected.

Kidney GeoMxSet objects

There are three different data objects representing data at different steps of the data processing pipeline.

  1. kidney_probe - probe level data, there can be several probes per gene (a.k.a target)
  2. kidney_target - target level data, where probe level information has been aggregated by geometric mean to give one count per gene. Note: The kidney data set was collected using the NanoString Whole Transcriptome Atlas, which has only one probe per gene. This results in the only aggregated feature being the negative probes.
  3. kidney_norm - target QC’d, filtered, and quantile normalized (Q3) target level count data
featureType(kidney_probe)
#> [1] "Probe"
featureType(kidney_target)
#> [1] "Target"
featureType(kidney_norm)
#> [1] "Target"

dim(kidney_probe)
#> Features  Samples 
#>    18642      231
dim(kidney_target)
#> Features  Samples 
#>    18504      231
dim(kidney_norm)
#> Features  Samples 
#>    16084      231

Subsetting by sample (column) or feature (row)

Within each object there are sample (region of interest) and feature (probe or target) annotations that can be used to subset the object by. This can be thought of as subsetting by column and row, respectively. As an example, we can remove any features labeled as control targets and split objects by normal vs disease.

tail(svarLabels(kidney_norm))
#> [1] "AOINucleiCount" "ROICoordinateX" "ROICoordinateY" "disease_status"
#> [5] "pathology"      "region"
fvarLabels(kidney_norm)
#> [1] "TargetName" "Module"     "CodeClass"  "ProbeID"    "Negative"

control_removed <- kidney_norm[fData(kidney_norm)[["CodeClass"]] == "Endogenous01", ]
dim(control_removed[, sData(control_removed)[["disease_status"]] == "normal"])
#> Features  Samples 
#>    16083      100
dim(control_removed[, sData(control_removed)[["disease_status"]] == "DKD"])
#> Features  Samples 
#>    16083      131
rm(control_removed)

Applying functions across the kidney data objects

Functions such as quantile normalization can be performed on GeoMxSet objects and subsequent results can be saved to the current object as an assayDataElement. Expression data are accessible through accessor functions and replacement requires the same dimensions as well as sample and feature names to match.

targets_qc_removed <- setdiff(featureNames(kidney_target), featureNames(kidney_norm))
qc_filtered <- kidney_target[!featureNames(kidney_target) %in% targets_qc_removed, ]
quantiles <- esApply(qc_filtered, 2, quantile, probs=0.75)
avg_quantiles <- ngeoMean(quantiles)
norm_factors <- quantiles / avg_quantiles
normed <- as.matrix(exprs(qc_filtered)) %*% diag(1 / norm_factors)
colnames(normed) <- sampleNames(qc_filtered)
try(assayDataElement(kidney_target, "norm") <- normed)
#> Error in .validate_assayDataElementReplace(obj, value) : 
#>   object and replacement value have different dimensions
assayDataElement(qc_filtered, "norm") <- normed
assayDataElement(qc_filtered, "norm")[1:2, 1:2]
#>      DSP-1001250007851-H-A02.dcc DSP-1001250007851-H-A03.dcc
#> A1BG                    7.228423                     6.56648
#> A1CF                   12.232716                    12.31215
exprs(qc_filtered)[1:2, 1:2]
#>      DSP-1001250007851-H-A02.dcc DSP-1001250007851-H-A03.dcc
#> A1BG                          13                           8
#> A1CF                          22                          15
assayDataElement(qc_filtered, "exprs") <- normed
exprs(qc_filtered)[1:2, 1:2]
#>      DSP-1001250007851-H-A02.dcc DSP-1001250007851-H-A03.dcc
#> A1BG                    7.228423                     6.56648
#> A1CF                   12.232716                    12.31215
rm(qc_filtered)

Comparing diabetic kidney disease and normal glomeruli

Differential expression is easily performed on the kidney data objects with packages such as limma. Designs can be appended to the objects to consistently pass the differential expression design through the analysis process. With the kidney dataset we can use these methods to compare diabetic kidney disease (DKD) glomeruli to normal and find the top 5 features (genes) that separate DKD from normal glomeruli.

glomeruli <- kidney_norm[, sData(kidney_norm)[["region"]] == "glomerulus"]
design(glomeruli) <- ~ `disease_status`
fit <- lmFit(exprs(glomeruli), model.matrix(design(glomeruli), sData(glomeruli)))
fit <- eBayes(fit)
DE_table <- topTable(fit, coef=2, adjust="fdr", n=5)
DE_table
#>             logFC   AveExpr        t      P.Value    adj.P.Val        B
#> DSTN     66.51779  83.72305 20.11222 1.794371e-44 2.886066e-40 86.85895
#> PCOLCE2  41.56335  41.57211 19.72293 1.518958e-43 8.195688e-40 84.92020
#> MAL2     45.76416  52.94508 19.72177 1.528666e-43 8.195688e-40 84.91441
#> VEGFA   222.79256 245.03567 18.39332 2.579344e-40 1.037154e-36 78.13117
#> CDKN1C  172.68002 163.29339 18.34772 3.341495e-40 1.074892e-36 77.89382
DE_glomeruli <- glomeruli[rownames(DE_table), ]
fData(DE_glomeruli)[["CodeClass"]] <- gsub("\\d+$", "", fData(DE_glomeruli)[["CodeClass"]])
heat <- autoplot(DE_glomeruli, type = "heatmap-genes",  elt = "exprs",
                        log2scale = FALSE, heatmapGroup = "disease_status")
heat

sessionInfo()
#> R version 4.0.4 (2021-02-15)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04 LTS
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#>  [1] limma_3.46.0              NanoStringHackathon_0.0.1
#>  [3] GeomxTools_0.99.3         NanoStringNCTools_0.99.8 
#>  [5] ggplot2_3.3.3             S4Vectors_0.28.1         
#>  [7] Biobase_2.50.0            BiocGenerics_0.36.0      
#>  [9] usethis_2.0.1             devtools_2.0.2           
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.3.1         pkgload_1.2.0      jsonlite_1.7.2     bslib_0.2.4       
#>  [5] assertthat_0.2.1   highr_0.8          vipor_0.4.5        cellranger_1.1.0  
#>  [9] outliers_0.14      yaml_2.2.1         remotes_2.0.4      sessioninfo_1.1.1 
#> [13] gdtools_0.2.3      pillar_1.5.0       glue_1.4.2         digest_0.6.27     
#> [17] RColorBrewer_1.1-2 XVector_0.30.0     colorspace_2.0-0   plyr_1.8.6        
#> [21] htmltools_0.5.1.1  pkgconfig_2.0.3    pheatmap_1.0.12    zlibbioc_1.36.0   
#> [25] purrr_0.3.4        scales_1.1.1       processx_3.4.5     tibble_3.1.0      
#> [29] farver_2.0.3       generics_0.1.0     IRanges_2.24.1     ellipsis_0.3.1    
#> [33] withr_2.4.1        cli_2.3.1          magrittr_2.0.1     crayon_1.4.1      
#> [37] readxl_1.3.1       memoise_1.1.0      evaluate_0.14      ps_1.5.0          
#> [41] fs_1.5.0           fansi_0.4.2        xml2_1.3.2         beeswarm_0.2.3    
#> [45] pkgbuild_1.2.0     ggthemes_4.1.1     tools_4.0.4        data.table_1.14.0 
#> [49] prettyunits_1.1.1  lifecycle_1.0.0    stringr_1.4.0      munsell_0.5.0     
#> [53] EnvStats_2.4.0     callr_3.5.1        Biostrings_2.58.0  compiler_4.0.4    
#> [57] jquerylib_0.1.3    systemfonts_1.0.1  rlang_0.4.10       grid_4.0.4        
#> [61] rjson_0.2.20       htmlwidgets_1.5.3  rmarkdown_2.7      testthat_3.0.2    
#> [65] gtable_0.3.0       curl_4.3           DBI_1.1.1          reshape2_1.4.4    
#> [69] R6_2.5.0           knitr_1.31         dplyr_1.0.4        ggiraph_0.6.1     
#> [73] utf8_1.1.4         rprojroot_2.0.2    desc_1.2.0         stringi_1.5.3     
#> [77] ggbeeswarm_0.6.0   Rcpp_1.0.6         vctrs_0.3.6        tidyselect_1.1.0  
#> [81] xfun_0.21