I usually like statistics, but what I've come to realize is that I only like reading statistics, I don't like calculating them. I've spent most of my school time for the past few days trying to figure out how to calculate confidence intervals and p-values for some data sets I've generated as part of my dissertation project. Aside from not knowing how to do this for simple cases, my sets of data are far from simple.

Briefly, I've got dozens of trials, each of which produces, among other variables, a score for each tribe of animats (artificial life agents) in the trial. However, this score is actually the average of 1,000 intermediate scores taken at 1,000 second intervals (total of 1,000,000 seconds per trial). Further, the animats in each tribe change over time, as animats are born and die; an animat dies and a new one is created, in each tribe, every 20,000 seconds. So, there are 50 distinct (overlapping) sets of animats per tribe over the course of a trial, and complete turn-over within a tribe every 200,000 seconds (since there are 10 animats per tribe). This yields five non-overlapping sets of animats per tribe per trial, but they aren't independent because the whole point of the experiment is that animats transmit knowledge and behavior (culture) across generations.

Whew. So, I have a bunch of data, and some experiments consist of only 20 trials. However, because of my setup, I think that these 20 trials have the statistical significance of nearly 100 shorter trials because of the non-overlapping sets and multiple tribes per trial. I have calculated correlation coefficients between scores and other measurements, and between scores and various animat characteristics that I want to test the usefulness of, but I am getting thoroughly lost as I try to figure out the statistical significance of my numbers, if any.



Email blogmasterofnoneATgmailDOTcom for text link and key word rates.

Site Info