Posts Tagged ‘quantifying archaeology’

Identification of Lithic Reduction Strategies from Mixed Assemblages

November 11, 2013

This post is the first in a series that will try to characterize lithic debitage assemblages formed from more than one reduction strategy. The primary goals are to estimate the proportions of the various reduction strategies represented within these mixed assemblages and to quantify the uncertainty of these estimates. I plan to use mixture models and the method of maximum likelihood to identify the distinct components of such assemblages.

20131111-101016.jpg

Brown (2001) suggests that the distribution of debitage size follows a power law. Power law distributions have the following probability density function:

f(x\vert \alpha) = C*x^{\alpha} ,

where C is a constant that normalizes the distribution, so the density integrates to one. The value of C thus depends entirely on the exponent \alpha .

Based on analysis of experimentally-produced assemblages, Brown further suggests that the exponent, \alpha , of these power law distributions varies among different reduction strategies. Thus, different reduction strategies produce distinctive debitage size distributions. This result could be very powerful, allowing reduction strategies from a wide variety of contexts to be characterized and distinguished. The technique used by Brown to estimate the value of the exponent, however, has some technical flaws.

Brown (2001) fits a linear regression to the relationship between the log of flake size grade and the log of the cumulative count of flakes in each size grade. In its favor, this approach seemingly reduces the effects of small sample sizes and can be easily replicated. The regression approach, on the other hand, also produces biased estimates of the exponent and does not allow the fit of the power law model to be compared to other probability density functions.

Maximum likelihood estimates, using data on the size of each piece of debitage, produce more reliable estimates of the exponent of a power law. Maximum likelihood estimates can also be readily compared among different distributions fit to the data, to evaluate whether a power law is the best model to describe debitage size distributions. The next post will illustrate the use of the linear regression approach and the maximum likelihood approach on simulated data drawn from a power law distribution.

Reference cited

Brown, Clifford T.
2001 The Fractal Dimensions of Lithic Reduction. Journal of Archaeological Science 28: 619-631.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013

Advertisements

On Monument Volume IV

April 29, 2013

This post evaluates burial mound volume, fitting various probability models to the data. As noted previously, the exponential distribution seems like an appropriate model to fit to the mound volume data. This model is not the only possibility, of course, so I will also consider an alternative, the gamma distribution. The exponential distribution is a simplified version of the gamma distribution.

The gamma probability density function (pdf) is:

f(x\vert \alpha , \lambda) = \frac{\lambda ^{\alpha} x^{\alpha - 1}e^{- \lambda x}}{\Gamma (\alpha)} ,

where:

\alpha is the shape parameter,

\lambda is the rate parameter,

and \Gamma is the gamma function.

The gamma function typically takes the following form:

\Gamma (\alpha) = \int_{0}^{\infty} t^{\alpha -1} e^{-t} dt

Depending on the parameter values, the graph of the gamma pdf can take a wide variety of shapes, including forms that resemble the bell-shaped curve of the normal distribution. The following illustration shows some of the possible variation.

20130429-205624.jpg

To evaluate the relationship between mound volume and mound condition (plowed and whole) under the gamma and exponential distributions, I analyzed model fit using the maximum likelihood method. The following R code details the analysis.

>library(bbmle)
>mdvol_g.mle=mle2(Allmds$Mound.Volume~dgamma(shape=shape, rate = gvar), start=list(shape = 1, gvar = 1/mean(Allmds$Mound.Volume)), data=Allmds, parameters = list(gvar~Allmds$Condition))
>mdvol_g.mle

>mdvol_e_cov.mle=mle2(Allmds$Mound.Volume~dexp(rate = avar), start=list(avar = 1/mean(Allmds$Mound.Volume)), data=Allmds, parameters = list(avar~Allmds$Condition))
>mdvol_e_cov.mle

>mdvol_e.mle= mle2(Allmds$Mound.Volume~dexp(rate = bvar), start=list(bvar = 1/mean(Allmds$Mound.Volume)), data=Allmds)
>mdvol_e.mle

In this code, Allmds refers to an R data frame containing the variables Mound.Volume and Condition. The code uses the maximum likelihood method to evaluate the fit of an exponential distribution to the data and to estimate parameter values. I performed the analysis three times. In the first analysis, I fit the gamma distribution, using Condition as a covariate. In the second and third analyses, I fit the exponential distribution to the data, once with the covariate Condition and once without the covariate.

The models are “nested”. The gamma distribution can be reduced to the exponential distribution by setting the gamma’s shape parameter to one. The exponential model without the covariate is a simplified version of the model with the covariate. Nested models can be compared using an ANOVA test to see whether the more complex model gives a significantly better fit to the data, justifying the extra complexity. The following two tables show the results of the analysis.

20130503-054055.jpg

20130503-053545.jpg
The initial results suggest that the exponential distribution with the covariate provides a significantly better fit to the data than the simpler model without the covariate. The gamma distribution does not provide a significantly better fit. Notice that the gamma’s shape parameter is estimated to be one, which reduces the gamma to the exponential distribution.

From this preliminary analysis, I offer the following conclusions. The exponential distribution appears to be an appropriate model for mound volume. In addition, plowed mounds may be distinctly smaller than whole mounds, contradicting my initial hypothesis. In subsequent posts, I will consider some archaeological implications and address some additional considerations that may help to explain these results.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

On Monument Volume III

April 20, 2013

For my study area, the distribution of burial mound volume for plowed and whole mounds looks similar. This distribution is also quite different from the normal distribution that characterizes so many traits in the natural world. The distribution of burial mound volume resembles the form of an exponential distribution. Exponential distributions have a peak at the extreme left end of the distribution and decline steadily and rapidly from that point. The exponential distribution has a single parameter, the rate, typically denoted by \lambda . The following function gives the probability density (sometimes called the pdf) of the exponential distribution.

f(x\vert \lambda) = \lambda e^{-\lambda x}

The pdf defines a curve. For a continuous distribution such as the exponential distribution, the area under this curve provides the probability of a sample taking on the value within the interval along the x-axis under the curve. The following illustration depicts these relationships. In the illustration, the shaded area under the curve represents the probability of a given sample falling between the two values of x.

20130420-193517.jpg

As a check on my intuition regarding the applicability of the exponential distribution, I generated a random sample of 2000 from an exponential distribution with a mean of 500. The following figure shows what such a distribution may look like. The simulation does not provide definitive proof, but it may nevertheless indicate whether a more rigorous analysis that employs the exponential distribution is worth pursuing.

20130420-165656.jpg

At least superficially, the histogram of the simulation results resembles the histograms of mound volume shown in the previous post. This simulation did not produce the apparent outliers seen in the mound data, but the resemblance suggests that burial mound volume can be modeled with an exponential distribution. I thus modeled mound volume with an exponential distribution, using mound condition (plowed or whole) as a covariate. I performed this analysis in R with the bbmle module. In the next post, I’ll present the code and initial results.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

On Monument Volume II

April 10, 2013

The previous post suggested that mound shape could be modeled as a spherical cap. I then proposed that the shape of those mounds may change through time, due to weathering and repeated plowing by modern agricultural equipment, but mound volume might remain the same. As illustrated in the following figure, mounds might become shorter but wider as they are weathered and plowed. In the figure, A represents the original mound shape, while B reflects mound shape after weathering and plowing. The height, h, has decreased over time, while the radius, a, has increased.

20130410-191354.jpg

Other hypotheses are possible, but I will evaluate this scenario first.

I have compiled museum data on mound condition and mound size for all recorded mounds in my study region. The museum records characterize mound condition as either “whole” or “plowed”. The records did not disclose the basis for this characterization. These records also document mound height and width. For each mound, I calculated a volume, assuming that mound shape resembles a spherical cap. The following two histograms illustrate the distribution of mound volume for plowed mounds and for whole mounds.

20130410-191534.jpg

20130410-191610.jpg

As you can see, the distributions of mound size for plowed and whole mounds look very similar. A few outliers may occur at the right tail of both distributions. These outliers represent unusually large mounds. The similarity of the histograms suggest that a single probability distribution could be used to model monument volume. The next post will evaluate monument volume more rigorously.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

On Monument Volume I

March 26, 2013

This post introduces an approach for evaluating the original size of round burial mounds. In one of the places where I’ve worked, burial mounds comprise a prominent feature of the landscape, as illustrated in the following photograph.

20130411-221328.jpg

This prominence may be amenable to explanation through formal high-level theory. Mound size, for example, may reflect the labor used to produce it, suggesting something about the size and organizational capabilities of the group that produced the mound. In order to use this feature of the monuments to evaluate high-level theory, the modern size should be an accurate reflection of the original size.

Such monuments may erode over time, making them less conspicuous and also less reliable as an index of the characteristics of the group that produced them. Natural weathering may take its toll, but modern agricultural practices probably affected burial mounds to a greater extent. Burials mounds were sometimes plowed repeatedly. These modern practices came later to the region where my case study is located, by which time laws protecting them had been enacted. Nevertheless, various processes leveled many mounds, perhaps decreasing their height and increasing their diameter. Despite these depredations, the original volume of the mounds may be preserved.

Mound shape can be modeled as a spherical cap, a geometric form representing the portion of a sphere above its intersection with a plane. Spherical caps are thus dome-shaped. The following figure illustrates a spherical cap. In the figure, h is the height of the dome, a is the radius of the dome’s base, and R is the radius of the sphere.

20130328-214531.jpg

The total volume of a spherical cap depends on the maximum dome height, h, and on the radius of the circle where the plane intersects with the plane, a. The formula for the volume, V, of a spherical cap is:

V=(\frac{1}{6})\pi h(3a^2+h^2)

Importantly, the calculation of the volume of a spherical cap does not depend on the radius of the sphere of which it is a part. The maximum possible original height of a mound, however, should be equal to the radius of that sphere. This height can be calculated by holding the volume constant and finding this value of the height and radius. At that point, the height and radius will be equal. Subsequent posts will explore these ideas further and play with some data on mound size.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

Identifying and Explaining Intensification in Prehistoric Fishing Practices VIII: Establishing the Number of Fish Caught by Nets or Other Gear through Mixture Models

October 2, 2009

The previous post in this series provided the middle-level theory needed to quantify the number and size of fish represented in an archeological assemblage. Recall that I am interested in determining how many of these fish were caught by nets and how many were caught by other gear. These gear should differ in the size range of fish that they are likely to capture.

The next step is to look at histograms of the data, which show the number of fish bone that occur within a particular size interval. Modes, or peaks, in the fish size-frequency histograms should reflect the use of different fishing gear. The mode of smaller fish should represent fish taken by nets, and the mode of larger fish should reflect fish taken by hooks and line or by spears. The following histograms show data from my archeological site. As noted in an earlier post, the fish bone assemblages derive from different levels within a single site, where minimal mixing has occurred among the levels.

Histograms of Fish Caudal Vertebra Height by Level from an Archaeological Site

Histograms of Fish Caudal Vertebra Height by Level from an Archaeological Site

The histograms for some of the levels show distinct modes. In particular, two modes seem to be present in the 50-55 cm level, while three modes apparently occur in the 30-35 cm level. The other levels are harder to interpret.

Clearly, the identification of these modes is not straightforward. The lack of more clear-cut patterning likely results from a heavy reliance on nets. Nets may catch both large fish as well as small fish. Other gear like hook and line or spears is much more likely to catch large fish. Prehistoric fishers used spears tipped with large stone points or sharpened bone. They employed hooks made from shells. Hooks and line or spears may not be able to catch fish smaller than some threshold value of size. In assemblages where net-caught fish predominate, the prevalence of net-caught fish may obscure any mode in the fish size distribution formed by fish caught with other gear.

Fortunately, statistical techniques exist which may help to distinguish separate populations which are mixed together in a single distribution. Finite mixture distributions model such situations. Such distributions can be analyzed using the mixdist package for R. This package allows the parameters of the contributing populations to be estimated, including the proportion of each population represented in the distribution and the mean vertebra size in each separate population. The following graph illustrates the application of a mixture model to data from the 50-55 cm level at the site.

Example of the Mixture Distribution Fit to Data from the 50-55 cm Level

Example of the Mixture Distribution Fit to Data from the 50-55 cm Level

For the mixture model, I fit two lognormal distributions to the data. The histogram depicts the original data. Note that the histogram interval differs from the interval used in the previous graph. The two dotted lines show the separate lognormal distributions fit to the data, and the black triangles identify the means of those distributions. The solid line shows the mixture model prediction that results from combining the two individual lognormal distributions. The gray bars at the bottom of the graphic show the deviations of the model from the observed distribution. The scale of the deviations is depicted in relative terms. This model appears to fit the data reasonably well.

I have also been working on a more rigorous analysis of the mixture models and their fit. This analysis is ongoing and has been plagued by some problems that I may have finally resolved. I will present some the results and issues in the next post in this series.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2009.

Software for Archaeological Analysis

September 22, 2009

Archeologists can be very particular when choosing equipment. We prefer Marshalltown trowels, Brunton compasses, Sharpies, Trimble GPS units, and Munsell color charts. We spend a lot of time using this equipment, so such preferences are understandable. I have also thought a bit about the software that I use to analyze data once I’ve gotten out of the field and have processed my finds. A good software package for graphics and analysis can greatly speed report production and open many possibilities for the rigorous documentation of variability in the data.

Among commercial statistical software, I am a big fan of Statistica. I particularly appreciate the graphical capabilities of Statistica. Statistica can produce a wide array of graph types out of the box, and these options are all easy to customize. I have gotten considerable mileage from its options for categorized histograms, for example. Statistica also offers a wide range of analyses, available in different optional packages. Lately, however, I have begun the transition to R.

R is a statistical computing program. As such, it is extremely flexible and powerful. R is command-line driven, which makes it very easy to document and replicate the steps that you undertook during an analysis. If you’ve ever had to conduct an analysis several times, you will understand the appeal of this feature. Graphics can be generated in R, and they can be extensively customized. A lot of packages for it have been developed, providing functions by which you can conduct an extremely broad range of analyses. R is also free, which is another compelling reason to adopt it. R has an avid group of users, so you can have confidence that the program will continue to be supported in the future. The downside, however, is the steep learning curve.

To aid novices, lots of guide books to R are available, and some of them are quite good. I recommend starting with Introductory Statistics with R. With this book, you should be able to jump right into generating basic graphics and running common statistical tests. Another really useful book is Ecological Models and Data in R. This book covers model-building and maximum likelihood methods for evaluating the fit of data to those models. While the book’s examples reflect the interests of an ecologist (obvs), the book focuses on developing skills that transcend any particular discipline. The book is quite well-written. The presentation is also sufficiently detailed that you can very easily use it as a point of departure for playing with your own data and models.

Whatever you choose, make sure that the program offers a lot of flexibility. The analyses that I have done varied quite a bit from one research project to the next. Any program that you obtain will require a considerable investment—both financial and intellectual—so you’ll want to have that investment pay off again and again.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2009.

Identifying and Explaining Intensification in Prehistoric Fishing Practices V: Quantifying the Relationship between Fish Size and Fish Bone Size

September 19, 2009

The previous post in this series established that a positive relationship exists between the live weight of fish and caudal vertebra height, providing support for the use of vertebra size as an index of overall fish size. I will attempt to quantify this relationship more precisely. Many different models could be chosen, but how should we select the most appropriate one?

The model needs to be appropriate for the structure of the data. The following graph shows a plot of the data with a linear model superimposed over the data points. The graph also depicts the deviation between data and the model with vertical lines. Notice that these deviations seem to get larger as the size of the fish gets larger. Linear models assume, among other things, that the variation remains constant. A linear model may not be appropriate.

Fish Live Weight and Vertebra Height Scatterplot with Linear Model

Fish Live Weight and Vertebra Height Scatterplot with Linear Model

A transformation of the data may help. Taking the log of both the live weight and the vertebra height produces more consistent variation. The next graph shows a linear model applied to the log of the data.

Log Transform of Live Fish Weight and Vertebra Height

Log Transform of Live Fish Weight and Vertebra Height

The deviations from the model are much more consistent. This model now seems reasonably appropriate to the structure of the transformed data in the sense that it doesn’t appear to violate the model assumptions. Those assumptions include normally-distributed variation and constant variation. Ideally, however, I’d like to fit a model that has an easier interpretation. Is there any theoretical basis for applying a particular type of model to fit to this data? As it turns out, the answer is “yes”, and I will discuss this model in the next post in the series.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2009.

Identifying and Explaining Intensification in Prehistoric Fishing Practices III: Identification of the Fishing Gear Used from the Size-Frequency Distribution of Fish

September 16, 2009

The previous post concluded that intensification of fishing could be identified from the kind of gear used to capture those fish. Having decided that fish bone assemblages should be subdivided based on the gear used to capture the fish, the issue then becomes: how can we identify that gear? Answering this question requires middle-level theory that can link physical characteristics of the fish assemblage to gear type.

Gear types differ in the sizes of fish that can be captured by them. Nets should capture a larger range of fish sizes than other gear such as hook and line or spear. Hook and line or spears can not effectively capture smaller species. Assemblages formed primarily from net-caught fish should have a larger proportion of small fish than those assemblages that formed from fish primarily caught by hook and line or spear. To verify this intuition, additional sources of data from which middle-level theory could be derived would be very helpful.

Baseline data on the size-frequency distribution of fish from nearshore ocean habitats, drawn from modern sources, could be compared to the size- frequency distribution of fish bone from archeological assemblages. Prehistoric fishers presumably selected a portion of the natural range of variation in fish size through their use of particular fishing gear. Thus, the comparison would facilitate the identification of such selection. Published modern data of this sort are surprisingly rare. Beach seine netting around an estuary in Alaska produced fish assemblages whose size-frequency distributions were largely unimodal with a long tail to the right. The size-frequency distribution of individual species varied from unimodal to multimodal, depending on the number of age-classes present. The applicability of these data as an analogy to fish from my study area can obviously be questioned. I don’t have any reason to believe that the form taken by the Alaskan size-frequency distributions is exceptional, however, and a consideration of the factors that produced these distributions may be useful.

Any nearshore habitat will likely contain a range of species, each represented by specimens from one or more age classes. Different species will vary in mean size within a particular age class. The aggregate of the individual size-frequency distributions is therefore likely to produce a highly variable unimodal distribution, particularly when individuals from many different species are represented. Assemblages formed from a mix of fish caught by net and fish caught by hook and line or spears should have a bimodal size-frequency distribution. The proportion of fish in each mode should reflect the emphasis placed on netting and other fishing gear. Variation in the size-frequency distribution among archeological assemblages should provide some indication of variation in the techniques used to take fish.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2009.

Identification and explanation of intensified prehistoric fishing practices I

September 13, 2009

In this series of posts, I will be exploring ways to identify and explain intensified prehistoric fishing practices. Intensification refers to the input of greater amounts of labor per unit capita to procure resources. As this formal definition implies, people are working harder at subsistence activities when they intensify their way of making a living. How do we determine when people are working harder from archaeological evidence? And what factors would induce people to intensify their efforts?

A lot of theories exist to address the latter question, but the former question is the more immediate problem. We need middle-level theory (sometimes labeled middle-range theory) appropriate to the nature of the archaeological evidence. Middle-level theory links archaeological data to phenomena of interest. It allows archaeologists to say with some confidence what happened in the past, based on that evidence.

My evidence comprises collections of fish bones and other artifacts from an organic-rich trash dump at a single archaeological site. Such trash dumps are often called middens. The occupation of the site spans several hundred years. The trash dump, however, has been sufficiently undisturbed since it was deposited that it could be excavated to recover evidence representative of much shorter spans of time. The mathematical tools that I found useful as I developed appropriate middle-level theory for this evidence included regression analysis, mixture models, and maximum likelihood models. In the next post, I will begin to develop the middle-level theory in detail, talking about the blind alleys down which I went, mistakes that I made, and solutions at which I arrived. Check back soon.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2009.