-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathUGA4_Metadata_Analysis.Rmd
More file actions
122 lines (94 loc) · 2.98 KB
/
UGA4_Metadata_Analysis.Rmd
File metadata and controls
122 lines (94 loc) · 2.98 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: "20201019 UGA4 Metadata"
author: "Emily Adney"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
html_document:
df_print: paged
toc: yes
html_notebook:
fig_caption: yes
number_sections: yes
toc: yes
---
The goal is to consider the metadata of all participants in the UGA4 cohort
as well as the HAI data for D0 and Day 28.
(HAI data for all cohorts is kept live at https://www.synapse.org/#!Synapse:syn21777633)
6 participants do not have availavle HAI data and have been removed.
So far we have chosen 160 participants for trancriptomics work done in Year 1.
We would like to choose 160 more for Year 2.
We are going to analyze all 362 samples.
```{r}
library(tidyverse)
library(readr)
library(tidyr)
library(dplyr)
library(ggplot2)
library(reshape2)
library(wesanderson) #colors
myCol <- wes_palette("Moonrise3", 4)
```
For the file being used (20201019_UGA4_Metadata.csv), this is a description some of the column headers :
[id]:Columns.jpg
```{r}
UGA4_Metadata <- read_csv("20201019_UGA4_Metadata.csv")
dim(UGA4_Metadata)
str(UGA4_Metadata)
```
Generate a table from csv :
```{r}
knitr::kable(UGA4_Metadata)
summary(UGA4_Metadata)
```
```{r}
ggplot(data = UGA4_Metadata, mapping = aes(x = Age)) +
geom_histogram()
```
```{r}
ggplot(data = UGA4_Metadata, mapping = aes(x = BMI_Value)) +
geom_histogram()
```
```{r}
ggplot(data = UGA4_Metadata, mapping = aes(x = Seroconversion_TedsScore_AllStrains)) +
geom_histogram()
```
```{r}
ggplot(data = UGA4_Metadata) +
geom_bar(mapping = aes(x = BMI_Category))
```
```{r}
ggplot(data = UGA4_Metadata) +
geom_bar(mapping = aes(x = SeroConversion_AllStrains))
```
```{r}
ggplot(data = UGA4_Metadata, mapping = aes(x = Age, y = BMI_Value)) +
geom_point(aes(color = Gender))
```
```{r}
ggplot(data = UGA4_Metadata) +
geom_bar(mapping = aes(x = D0_ALLSTRAINS_SEROSTATUS))
```
Day 0 SeroStatus by Strain
```{r}
plotdata <- UGA4_Metadata %>% select(UGA_ID,Baseline_Serostatus_H1N1, Baseline_Serostatus_H3N2 ,Baseline_Serostatus_Yamagata,Baseline_Serostatus_Victoria) %>%
rename(D0_Sero_H1N1 = Baseline_Serostatus_H1N1, D0_Sero_H3N2 = Baseline_Serostatus_H3N2, D0_Sero_Yama = Baseline_Serostatus_Yamagata, D0_Sero_Vict = Baseline_Serostatus_Victoria) %>%
melt(id = "UGA_ID") %>%
group_by(variable,value) %>%
summarize(n())
ggplot(plotdata, aes(fill=value, y=`n()`, x=variable)) +
geom_bar(position="stack", stat="identity") +
ylab("count") +
xlab("")
```
Day 28 SeroStatus by Strain
```{r}
plotdata <- UGA4_Metadata %>% select(UGA_ID, D28_Serostatus_H1N1, D28_Serostatus_H3N2 , D28_Serostatus_Yamagata, D28_Serostatus_Victoria) %>%
rename(D28_Sero_H1N1 = D28_Serostatus_H1N1, D28_Sero_H3N2 = D28_Serostatus_H3N2, D28_Sero_Yama = D28_Serostatus_Yamagata, D28_Sero_Vict = D28_Serostatus_Victoria) %>%
melt(id = "UGA_ID") %>%
group_by(variable,value) %>%
summarize(n())
ggplot(plotdata, aes(fill=value, y=`n()`, x=variable)) +
geom_bar(position="stack", stat="identity") +
ylab("count") +
xlab("")
```