UGA4/UGA4_Metadata_Analysis.Rmd at master · GreshamLab/UGA4 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: "20201019 UGA4 Metadata"
author: "Emily Adney"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
  html_document:
    df_print: paged
    toc: yes
  html_notebook:
    fig_caption: yes
    number_sections: yes
    toc: yes
---
The goal is to consider the metadata of all participants in the UGA4 cohort
as well as the HAI data for D0 and Day 28.

(HAI data for all cohorts is kept live at https://www.synapse.org/#!Synapse:syn21777633)
6 participants do not have availavle HAI data and have been removed.

So far we have chosen 160 participants for trancriptomics work done in Year 1.
We would like to choose 160 more for Year 2.

We are going to analyze all 362 samples.

```{r}
library(tidyverse)
library(readr)
library(tidyr)
library(dplyr)
library(ggplot2)
library(reshape2)
library(wesanderson) #colors
myCol <- wes_palette("Moonrise3", 4)
```

For the file being used (20201019_UGA4_Metadata.csv), this is a description some of the column headers :
[id]:Columns.jpg

```{r}
UGA4_Metadata <- read_csv("20201019_UGA4_Metadata.csv")
dim(UGA4_Metadata)
str(UGA4_Metadata)
```

Generate a table from csv :

```{r}
knitr::kable(UGA4_Metadata)
summary(UGA4_Metadata)
```

```{r}
ggplot(data = UGA4_Metadata, mapping = aes(x = Age)) +
 geom_histogram()

```
```{r}
ggplot(data = UGA4_Metadata, mapping = aes(x = BMI_Value)) +
 geom_histogram()

```

```{r}
ggplot(data = UGA4_Metadata, mapping = aes(x = Seroconversion_TedsScore_AllStrains)) +
 geom_histogram()

```

```{r}
ggplot(data = UGA4_Metadata) +
  geom_bar(mapping = aes(x = BMI_Category))

```


```{r}
ggplot(data = UGA4_Metadata) +
  geom_bar(mapping = aes(x = SeroConversion_AllStrains))
```

```{r}
ggplot(data = UGA4_Metadata, mapping = aes(x = Age, y =  BMI_Value)) +
  geom_point(aes(color = Gender))

```


```{r}
ggplot(data = UGA4_Metadata) +
  geom_bar(mapping = aes(x = D0_ALLSTRAINS_SEROSTATUS))
```

Day 0 SeroStatus by Strain

```{r}
plotdata <- UGA4_Metadata %>% select(UGA_ID,Baseline_Serostatus_H1N1, Baseline_Serostatus_H3N2 ,Baseline_Serostatus_Yamagata,Baseline_Serostatus_Victoria) %>%
  rename(D0_Sero_H1N1 = Baseline_Serostatus_H1N1, D0_Sero_H3N2 = Baseline_Serostatus_H3N2, D0_Sero_Yama = Baseline_Serostatus_Yamagata, D0_Sero_Vict = Baseline_Serostatus_Victoria) %>%
  melt(id = "UGA_ID") %>%
  group_by(variable,value) %>%
  summarize(n())

ggplot(plotdata, aes(fill=value, y=`n()`, x=variable)) +
    geom_bar(position="stack", stat="identity") +
  ylab("count") +
  xlab("")
```

Day 28 SeroStatus by Strain

```{r}
plotdata <- UGA4_Metadata %>% select(UGA_ID, D28_Serostatus_H1N1, D28_Serostatus_H3N2 , D28_Serostatus_Yamagata, D28_Serostatus_Victoria) %>%
  rename(D28_Sero_H1N1 = D28_Serostatus_H1N1, D28_Sero_H3N2 = D28_Serostatus_H3N2, D28_Sero_Yama = D28_Serostatus_Yamagata, D28_Sero_Vict = D28_Serostatus_Victoria) %>%
  melt(id = "UGA_ID") %>%
  group_by(variable,value) %>%
  summarize(n())

ggplot(plotdata, aes(fill=value, y=`n()`, x=variable)) +
    geom_bar(position="stack", stat="identity") +
  ylab("count") +
  xlab("")
```