I’m excited to welcome Phil Douchinsky to the Puckett Lab! Phil had been working as a research technician on a conservation genomics project looking at gene flow in fish across the landscape. For his MS, he will compare and contrast isolation by resistance of black (Ursus americanus) and brown (U. arctos) bears across the Southeastern Alaskan landscape.
Puckett Lab Hosts Mark Edwards for Seminar
The Puckett Lab hosted Dr. Mark Edwards for seminar. Dr. Edwards is the Curator of Mammalogy at the Royal Alberta Museum and studies the ecology of carnivores and large herbivores in northern temperate and Holarctic environments. His talk was titled, “Museum Collections Cast New Light on Spatial Ecology of Extant Species.”
Puckett Lab Hosts Loren Cassin Sackett for Seminar
The Puckett Lab hosted Dr. Loren Cassin Sackett for seminar. Dr. Cassin Sackett studies how introduced parasites affect genomic diversity and population persistence in native wildlife populations. Her talk was titled, “The consequences of introduced pathogens on host evolution” and focused on studies in prairie dogs and Hawai’i ‘amakihi.
Welcome to Heather Clendenin
Very excited to welcome Heather Clendenin to the Puckett Lab! Heather recently finished her MS at the University of Idaho where she investigated sibling relatedness in gray wolves (Canis lupus). For her PhD, she will estimate genetic load in black bear (Ursus americanus) populations with varying demographic histories.
Puckett Lab Hosts Marc Tollis for Seminar
The Puckett Lab hosted Dr. Marc Tollis for seminar. Dr. Tollis studies cancer evolution within mammals, transposable elements, and has broader interests in the systematics of vertebrates. His talk was titled, “Tempo and mode of Peto’s Paradox: Comparative genomics and the evolution of cancer suppression.”
Puckett Lab Hosts Greg Barsh for Seminar
The Puckett Lab hosted Dr. Greg Barsh for seminar. Dr. Barsh is a Faculty Investigator at HudsonAlpha and Professor at Stanford University. He studies the genetic basis of mammalian coat color and patterning (e.g. stripes, spots) variation. His talk was titled, “Stripes and spots: Genetics of mammalian color pattern.”
Puckett Lab Hosts Emily Latch for Seminar
The Puckett Lab hosted Dr. Emily Latch for seminar. Dr. Latch is an Associate Professor at the University of Wisconsin- Milwaukee. She studies phylogeography and landscape genetics of several mammal species, with an emphasis on using this information to inform management. Dr. Latch met with the students in the Urban Ecology & Wildlife Management class, toured Meeman Biological Field Station, and gave her talk, “Wild bison, hidden deer: Conservation Genetics for a changing world.”
Welcome to Matthew Pollard
Very excited to welcome Matthew Pollard to the Puckett Lab! Matthew recently finished his MS at Cardiff University where he was investigating the Y-chromosome of brown bears (Ursus arctos). For his PhD, he will work on comparative genomics of bears.
Puckett Lab Opening Fall 2018 at the University of Memphis
I am ecstatic to join the faculty at the University of Memphis as an Assistant Professor in the Biology Department. The Puckett Lab will open Fall 2018 and focus on phylogeography and evolutionary genomics within the bear family.
If you are interested in joining the lab, please see the “Positions in the Lab” page for current information on positions.
Making conStruct Input Files
As part of my postdoc with Gideon Bradburd, I’m using his new software package conStruct (bioRxiv; GitHub) to analyze dozens of genomic datasets. conStruct requires three input files: 1) genetic data, 2) coordinate data (longitude in the first column, latitude in the second), and 3) a pairwise distance matrix with the same number of sites as in the coordinate data. Files 2 and 3 are straight forward; but it took me a little time to be able to go from a regular STRUCTURE file to a conStruct file. So below is the R code I’m using to do this conversion.
Now if your data is not already in STRUCTURE two row format (i.e. two rows per sample), then you’ll need to get there as a starting place. I used PGDSpider to make the STRUCTURE files WITH a header row and a column to denote sampling site.
(PLINK, bless its heart, makes 1 row 1 column STRUCTURE files, and I’m not coding out of that.) Remember you want to denote sampling sites for conStruct and not putative populations. I then replaced the two blank headers for columns one and two with “SampleID” and “PopID.”
conStruct can take data as counts or frequencies. The code below makes a table of frequencies for one allele (doesn’t matter major or minor, derived or ancestral) for each sampling site for each locus.
I have written this as a loop to process multiple input files at once. You can remove the for loop and start at “str <- read.table()” if you only have one file.
setwd("Enter the path to your working directory") files <- list.files(pattern = "*.str",full.names=T) newnames <- paste(sep="",sub('.str', '',files),"-Processed.str") #Loop over all files and make the processed files needed for conStruct for(i in 1:length(files)){ #Read data file and convert missing data to NA str <- read.table(files[i],header=T) str[str == "-9"] <- NA str <- str[ order(str$PopID,str$SampleID),] #Count number of samples SampleID <- as.character(unique(str$SampleID)) #Looping over all loci, create a frequency table of alleles (0,1,2) #Jacob Burkhart wrote this loop count <- data.frame(SampleID) for(loci in 3:dim(str)[2]){ temp <- table(str$SampleID, str[,loci]) colnames(temp) <- paste0(colnames(str)[loci], "-", colnames(temp)) temp <- data.frame(unclass(temp)) #If there are no alleles, recode the row as -9 for(j in 1:dim(temp)[1]){ if(sum(temp[j,]) == 0) {temp[j,] <- NA} } #Test if a monomorphic locus slipped through your data processing #If so, column bind data to sample ID and any previous datasets #If not (as expected), then the column bind will be applied to the 2nd allele #Why the 2nd allele? Because any loci with missing data will result in data being added to the table count <- as.matrix(cbind(count,if(length(temp)==1){temp[,1]} else{temp[,2]})) } #Create a vector of the sampling site information for each sample pop.vec <- as.vector(str[,2]) pop.vec <- pop.vec[c(seq(from=1, to=nrow(str), by=2))] #Make variables to utilize below n.pops <- length(unique(pop.vec)) table.pops <- data.frame(table(pop.vec)) #Make a file of individual sample allele frequencies #If you only have one sample per sampling site, then you could stop here freq <- matrix(as.numeric(count[,-1])/2,nrow(count),ncol(count)-1) f <- matrix(as.numeric(freq),nrow(freq),ncol(freq)) #Empty matrix for sampling site level calculations admix.props <- matrix(NA, n.pops,ncol(f)) #Calculate frequency (of 2nd allele) per sampling site #The last line tests if there is a sampling site with n=1 #If so, prints vector because frequency has already been calculated (0, 0.5, or 1) #If not, then calculates mean across samples from that site for(m in 1:length(table.pops$pop.vec)){ t<-as.factor(unique(pop.vec))[m] admix.props[m,] <- if(table.pops[table.pops$pop.vec == t,2] == 1){f[which(pop.vec==t),]} else{colMeans(f[which(pop.vec==t),],na.rm=T)} } #Export conStruct file and save in working directory write.table(admix.props, newnames[i],quote=F,sep="\t",row.names=F,col.names=F) }
As I noted in the code, my friend Jake Burkhart wrote the internal for loop that makes the frequency table. He originally wrote the loop to make pseudo-SNP datasets out of microsatellite data. Which means, if you want to run conStruct on a microsatellite dataset, you can print all of the loci (instead of just one of the biallelic SNPs), then keep processing the frequencies at each sampling site. Note, conStruct will throw an error if there are fewer loci than samples, which shows up more readily when using pseudo-SNP data from (even highly polymorphic) microsatellites.