## Why I Blog about Archaeology (and Math)

November 28, 2013

The upcoming 2014 Society for American Archaeology meetings in Austin will include a session on blogging and archaeology. As part of that session, Doug’s Archaeology is hosting a monthly blogging carnival, in which participating blogs post on the same topic. I’ve been invited to join this group. The first topic is “why blog about archaeology”. This question has a couple answers.

The primary reason is that the blog provides some small motivation to continue to write and research. In this format, I can post pieces of a larger project. Those pieces get posted as they are completed. Each post thus feels like an accomplishment but also contributes to the goal of finishing the project. The first series of posts that I wrote, on the intensification of fishing, did eventually get formally published as a single piece. The blog was very helpful in developing that paper during the long slog toward publication.

My experience working on that paper and the blog led me to identify some other, related benefits of blogging. Due to space constraints, some material that I developed for the intensification paper was dropped from the final version. The blog provided a venue in which to share those thoughts, if someone should be interested in that aspect of the topic or greater details on the topic. For example, I used the blog to post R code that I wrote to perform some of the analyses for the intensification paper. This experience led me to the realization that the blog was also a great place for small projects that would likely never get formally published. The next series of posts that I wrote concerned variation in burial monument size. It’s possible that some of this material will resurface as a part of another project, but for now those posts were just a fun little diversion.

Other professions, such as economics, have a more well-developed tradition of posting working papers, papers that are pre-publication. Blogs are a great way to develop and further this tradition and to share data and insights that might not otherwise fit within the strictures of academic publishing.

## Power Law Distributions and Model-fitting Approaches

November 18, 2013

A power law distribution has a heavy tail. The following simulation results depict the form of a power law distribution. In this example, the value of $\alpha$ is 3.6, the minimum value is 8.9, and the sample size is 75. The histogram shows the values that resulted from the simulation run, while the red line shows the theoretical distribution. Note that a few simulation values are much, much larger than the rest. To preview some results to be presented later, the distribution of debitage size in lithic reduction experiments looks similar. Such similarity is suggestive but not definitive.

As discussed previously, linear regression analysis has sometimes been used to evaluate the fit of the power law distribution to data and to estimate the value of the exponent, $\alpha$. This technique produces biased estimates, which the next simulation results illustrate. In the simulation, a random number generator produced a sample of numbers drawn from a power law distribution at a particular value of $\alpha$. I then analyzed this artificial data set using the linear regression approach described by Brown (2001) and using maximum likelihood estimation through a direct search of the parameter space. For a simple power law distribution, the maximimum likelihood estimate could also be found analytically. I used a direct search approach, however, in anticipation of using this approach for more complex mixture models. I repeated the random number generation and analysis 35 separate times for several different combinations of $\alpha$ and sample sizes. The following histograms show the estimates for $\alpha$ using the linear regression analysis and maximum likelihood estimation to find that value. In this particular case, $\alpha$ was set to 3.6 in the random number generator, the minimum value was set to 8.9, and the sample size was 500.

The histograms clearly suggest that the maximum likelihood estimates center closely around the true value of alpha, while the regression analyses skew to a lower value. The simulations that I performed at other parameter values and sample sizes displayed similar results. Other, more extensive simulation work also supports these impressions (Clauset et al. [2009] provides these results as part of a detailed, comprehensive technical discussion). Consequently, I used maximum likelihood estimation to fit probability distributions to data in the subsequent analyses.

References cited

Brown, Clifford T.
2001 The Fractal Dimensions of Lithic Reduction. Journal of Archaeological Science 28: 619-631.

Clauset, Aaron; Cosma Rohilla Shalizi; and Mark E. J. Newman
2009. Power-law distributions in empirical data. SIAM Review 51(4): 661-703.

## Identification of Lithic Reduction Strategies from Mixed Assemblages

November 11, 2013

This post is the first in a series that will try to characterize lithic debitage assemblages formed from more than one reduction strategy. The primary goals are to estimate the proportions of the various reduction strategies represented within these mixed assemblages and to quantify the uncertainty of these estimates. I plan to use mixture models and the method of maximum likelihood to identify the distinct components of such assemblages.

Brown (2001) suggests that the distribution of debitage size follows a power law. Power law distributions have the following probability density function:

$f(x\vert \alpha) = C*x^{\alpha}$,

where C is a constant that normalizes the distribution, so the density integrates to one. The value of C thus depends entirely on the exponent $\alpha$.

Based on analysis of experimentally-produced assemblages, Brown further suggests that the exponent, $\alpha$, of these power law distributions varies among different reduction strategies. Thus, different reduction strategies produce distinctive debitage size distributions. This result could be very powerful, allowing reduction strategies from a wide variety of contexts to be characterized and distinguished. The technique used by Brown to estimate the value of the exponent, however, has some technical flaws.

Brown (2001) fits a linear regression to the relationship between the log of flake size grade and the log of the cumulative count of flakes in each size grade. In its favor, this approach seemingly reduces the effects of small sample sizes and can be easily replicated. The regression approach, on the other hand, also produces biased estimates of the exponent and does not allow the fit of the power law model to be compared to other probability density functions.

Maximum likelihood estimates, using data on the size of each piece of debitage, produce more reliable estimates of the exponent of a power law. Maximum likelihood estimates can also be readily compared among different distributions fit to the data, to evaluate whether a power law is the best model to describe debitage size distributions. The next post will illustrate the use of the linear regression approach and the maximum likelihood approach on simulated data drawn from a power law distribution.

Reference cited

Brown, Clifford T.
2001 The Fractal Dimensions of Lithic Reduction. Journal of Archaeological Science 28: 619-631.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013

## More Thoughts on Mound Size Variability

June 29, 2013

This post begins to explore additional patterning in mound size, refining some of my earlier observations and offering some hypotheses for evaluation. Suppose mound-building groups occupied stable territories over the span of several generations or longer. Within the territory held by such groups, they built burial mounds. Many burial mounds within a given area may thus have been produced by the same group or lineage. Under these circumstances, burial mounds located in close proximity should be more likely to be the product of a single group or lineage. If the group traits that influenced mound volume were also relatively stable through time, burial mounds located near to each other should be similar in size. As an initial attempt to evaluate these claims, I looked at the relationship in mound size between mounds that were nearest neighbors and between randomly-paired mounds.

Recall that most mounds have been affected by modern plowing and other disturbances, but some mounds have been largely spared such damage. The museum records that I used characterized these undamaged mounds as “whole”. The museum records documented 287 whole mounds. To make sure that the comparisons were fair, I limited the sample of nearest neighbors to just those whole mounds that had another whole mound as its nearest neighbor. I eliminated duplicate pairings, so each pair of nearest neighbors was only considered once. The imposition of these constraints shrunk the nearest neighbor sample size to 49. Finally, I ran a simple linear regression to evaluate the relationship between the size of the mounds in these nearest neighbor pairings. Because the distribution of mound volume can be modeled as an exponential distribution, I used the log of mound volume in the regression analysis. Without this transformation, any relationship in
mound size between the nearest neighbors would be unlikely to be well approximated by a straight line.

I then sampled without replacement from the 287 whole mounds to obtain 49 randomly-selected pairs. As with the nearest neighbors, I performed a simple linear regression, using log volume. I repeated this procedure 500 times. The repeated sampling and analysis allowed me to develop a null hypothesis for the values of the regression coefficients.

I expected that the randomly-selected pairs would not have a meaningful relationship. The slope of the regression line should be close to zero for these samples. In contrast, the size of the nearest neighbor pairs should be positively correlated, so the slope of the regression line should be significantly larger than zero. The following two figures show the distribution of the regression coefficients, the intercept and slope, for the randomly-selected pairs.

Notice, in particular, that the distribution of the slope clusters near zero as predicted. This result indicates that the randomly-selected pairs do not have a meaningful relationship with each other, at least with respect to size.

These distributions contrast with the regression coefficients calculated for the nearest neighbors. The intercept is 0.90, and the slope is 0.75. These values are completely beyond the range of values estimated for the randomly-selected pairs. This experiment shows that the size of nearest neighbors is significantly and positively correlated. The results lend some support to the notion that stable groups produced these mounds. At the very least, the results provide encouragement to further explore the relationship between mound size and mound spatial distribution. Such work should probably make use of the spatial analysis tools available in GIS programs.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

## A Very Preliminary Interpretation of Mound Size Variability

May 11, 2013

Monumental architecture, by virtue of its scale, implies something about the organizational capabilities of the groups that produced it. The previous analysis of burial mound size further implies something about the variation in those capabilities. I explore some of those implications at greater length here.

The following thoughts should be considered preliminary. The original goal of this particular analysis was very modest, concerned with establishing a reliable measure of monument scale or prominence. My hope was that mound volume had stayed reasonably constant despite the effects of weathering and other processes. The analysis was being done as part of a project regarding monument function and social organization. While the analysis showed that many mounds lost volume as a result of modern plowing, it also showed that the volume of plowed mounds and whole mounds varied in very similar fashion. Variation in mound volume can be modeled with the exponential distribution. I did not expect this result at the outset.

I have often regarded mounds as potentially reflecting the “strength” of the groups that built them. Group strength might be a function of many different factors, such as group size, the productivity of the territory that the group occupies, the group’s organizational capabilities and the size of the social network upon which the group could call. Groups that scored higher on these variables should have been capable of building larger mounds. Groups that scored lower on these variables should have been limited to building smaller mounds. A large list of qualities could thus contribute to group strength and to burial mound size. I assumed that each factor would have a small additive effect on strength. Consequently, I supposed that variation in group strength and mound size should take the form of a normal distribution.

Clearly, this intuition was wrong. Upon further reflection, I think that I’ve underestimated the contribution of social networks. Their contribution is probably not minor. Ethnographic studies of leadership in small-scale societies illustrate the hard work and emphasis that group leaders often put on the maintenance of their networks. The effect of each additional ally is probably not merely additive, since each ally that gets incorporated has the potential to contribute their own unique allies to the network. Modern studies of social networks indicate that variability among individuals in network size has a heavy-tailed distribution, where most individuals have a relatively small network and a few individuals have very large networks. The mound data is suggestive of similar processes at play.

Before getting too carried away, let me emphasize again that this interpretation is very preliminary. It is, nevertheless, consistent with other archeological evidence for the operation of long-distance exchange networks at the time. The results also illustrate the potential value of this form of statistical modeling. The type of probability model which can be fit to the data — whether normal, exponential, or some other model — reflect the type of processes which operated in the past. The modeling thus constrains the set of possible interpretations that should be considered.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

## On Monument Volume IV

April 29, 2013

This post evaluates burial mound volume, fitting various probability models to the data. As noted previously, the exponential distribution seems like an appropriate model to fit to the mound volume data. This model is not the only possibility, of course, so I will also consider an alternative, the gamma distribution. The exponential distribution is a simplified version of the gamma distribution.

The gamma probability density function (pdf) is:

$f(x\vert \alpha , \lambda) = \frac{\lambda ^{\alpha} x^{\alpha - 1}e^{- \lambda x}}{\Gamma (\alpha)}$,

where:

$\alpha$ is the shape parameter,

$\lambda$ is the rate parameter,

and $\Gamma$ is the gamma function.

The gamma function typically takes the following form:

$\Gamma (\alpha) = \int_{0}^{\infty} t^{\alpha -1} e^{-t} dt$

Depending on the parameter values, the graph of the gamma pdf can take a wide variety of shapes, including forms that resemble the bell-shaped curve of the normal distribution. The following illustration shows some of the possible variation.

To evaluate the relationship between mound volume and mound condition (plowed and whole) under the gamma and exponential distributions, I analyzed model fit using the maximum likelihood method. The following R code details the analysis.

>library(bbmle)
>mdvol_g.mle=mle2(Allmds$Mound.Volume~dgamma(shape=shape, rate = gvar), start=list(shape = 1, gvar = 1/mean(Allmds$Mound.Volume)), data=Allmds, parameters = list(gvar~Allmds$Condition)) >mdvol_g.mle >mdvol_e_cov.mle=mle2(Allmds$Mound.Volume~dexp(rate = avar), start=list(avar = 1/mean(Allmds$Mound.Volume)), data=Allmds, parameters = list(avar~Allmds$Condition))
>mdvol_e_cov.mle

>mdvol_e.mle= mle2(Allmds$Mound.Volume~dexp(rate = bvar), start=list(bvar = 1/mean(Allmds$Mound.Volume)), data=Allmds)
>mdvol_e.mle

In this code, Allmds refers to an R data frame containing the variables Mound.Volume and Condition. The code uses the maximum likelihood method to evaluate the fit of an exponential distribution to the data and to estimate parameter values. I performed the analysis three times. In the first analysis, I fit the gamma distribution, using Condition as a covariate. In the second and third analyses, I fit the exponential distribution to the data, once with the covariate Condition and once without the covariate.

The models are “nested”. The gamma distribution can be reduced to the exponential distribution by setting the gamma’s shape parameter to one. The exponential model without the covariate is a simplified version of the model with the covariate. Nested models can be compared using an ANOVA test to see whether the more complex model gives a significantly better fit to the data, justifying the extra complexity. The following two tables show the results of the analysis.

The initial results suggest that the exponential distribution with the covariate provides a significantly better fit to the data than the simpler model without the covariate. The gamma distribution does not provide a significantly better fit. Notice that the gamma’s shape parameter is estimated to be one, which reduces the gamma to the exponential distribution.

From this preliminary analysis, I offer the following conclusions. The exponential distribution appears to be an appropriate model for mound volume. In addition, plowed mounds may be distinctly smaller than whole mounds, contradicting my initial hypothesis. In subsequent posts, I will consider some archaeological implications and address some additional considerations that may help to explain these results.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

## On Monument Volume III

April 20, 2013

For my study area, the distribution of burial mound volume for plowed and whole mounds looks similar. This distribution is also quite different from the normal distribution that characterizes so many traits in the natural world. The distribution of burial mound volume resembles the form of an exponential distribution. Exponential distributions have a peak at the extreme left end of the distribution and decline steadily and rapidly from that point. The exponential distribution has a single parameter, the rate, typically denoted by $\lambda$. The following function gives the probability density (sometimes called the pdf) of the exponential distribution.

$f(x\vert \lambda) = \lambda e^{-\lambda x}$

The pdf defines a curve. For a continuous distribution such as the exponential distribution, the area under this curve provides the probability of a sample taking on the value within the interval along the x-axis under the curve. The following illustration depicts these relationships. In the illustration, the shaded area under the curve represents the probability of a given sample falling between the two values of x.

As a check on my intuition regarding the applicability of the exponential distribution, I generated a random sample of 2000 from an exponential distribution with a mean of 500. The following figure shows what such a distribution may look like. The simulation does not provide definitive proof, but it may nevertheless indicate whether a more rigorous analysis that employs the exponential distribution is worth pursuing.

At least superficially, the histogram of the simulation results resembles the histograms of mound volume shown in the previous post. This simulation did not produce the apparent outliers seen in the mound data, but the resemblance suggests that burial mound volume can be modeled with an exponential distribution. I thus modeled mound volume with an exponential distribution, using mound condition (plowed or whole) as a covariate. I performed this analysis in R with the bbmle module. In the next post, I’ll present the code and initial results.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

## On Monument Volume II

April 10, 2013

The previous post suggested that mound shape could be modeled as a spherical cap. I then proposed that the shape of those mounds may change through time, due to weathering and repeated plowing by modern agricultural equipment, but mound volume might remain the same. As illustrated in the following figure, mounds might become shorter but wider as they are weathered and plowed. In the figure, A represents the original mound shape, while B reflects mound shape after weathering and plowing. The height, h, has decreased over time, while the radius, a, has increased.

Other hypotheses are possible, but I will evaluate this scenario first.

I have compiled museum data on mound condition and mound size for all recorded mounds in my study region. The museum records characterize mound condition as either “whole” or “plowed”. The records did not disclose the basis for this characterization. These records also document mound height and width. For each mound, I calculated a volume, assuming that mound shape resembles a spherical cap. The following two histograms illustrate the distribution of mound volume for plowed mounds and for whole mounds.

As you can see, the distributions of mound size for plowed and whole mounds look very similar. A few outliers may occur at the right tail of both distributions. These outliers represent unusually large mounds. The similarity of the histograms suggest that a single probability distribution could be used to model monument volume. The next post will evaluate monument volume more rigorously.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

## On Monument Volume I

March 26, 2013

This post introduces an approach for evaluating the original size of round burial mounds. In one of the places where I’ve worked, burial mounds comprise a prominent feature of the landscape, as illustrated in the following photograph.

This prominence may be amenable to explanation through formal high-level theory. Mound size, for example, may reflect the labor used to produce it, suggesting something about the size and organizational capabilities of the group that produced the mound. In order to use this feature of the monuments to evaluate high-level theory, the modern size should be an accurate reflection of the original size.

Such monuments may erode over time, making them less conspicuous and also less reliable as an index of the characteristics of the group that produced them. Natural weathering may take its toll, but modern agricultural practices probably affected burial mounds to a greater extent. Burials mounds were sometimes plowed repeatedly. These modern practices came later to the region where my case study is located, by which time laws protecting them had been enacted. Nevertheless, various processes leveled many mounds, perhaps decreasing their height and increasing their diameter. Despite these depredations, the original volume of the mounds may be preserved.

Mound shape can be modeled as a spherical cap, a geometric form representing the portion of a sphere above its intersection with a plane. Spherical caps are thus dome-shaped. The following figure illustrates a spherical cap. In the figure, h is the height of the dome, a is the radius of the dome’s base, and R is the radius of the sphere.

The total volume of a spherical cap depends on the maximum dome height, h, and on the radius of the circle where the plane intersects with the plane, a. The formula for the volume, V, of a spherical cap is:

$V=(\frac{1}{6})\pi h(3a^2+h^2)$

Importantly, the calculation of the volume of a spherical cap does not depend on the radius of the sphere of which it is a part. The maximum possible original height of a mound, however, should be equal to the radius of that sphere. This height can be calculated by holding the volume constant and finding this value of the height and radius. At that point, the height and radius will be equal. Subsequent posts will explore these ideas further and play with some data on mound size.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2013.

## A Useful Resource for Understanding Combinatorics and Probability Theory

January 19, 2011

Combinatorial principles can be difficult to understand from modern textbooks. The more elegant the explanation, the more it is couched in abstract mathematical language. A book that I picked up a long time ago has helped me to bridge the gap between those elegant, abstract textbook explanations and a concrete understanding of them. The book Lady Luck by Warren Weaver provides a relatively rigorous explanation of probability in plain, clear language. I have turned to this book many times to supplement my textbooks when the textbook explanation of a concept proved elusive.

© Scott Pletka and Mathematical Tools, Archaeological Problems, 2011.