Introduction

Humans have expressed physical experiences and abstract ideas in artistic paintings such as cave paintings, frescos in cathedrals and even graffiti on city walls. Such paintings, to convey intended messages, consist of three fundamental building blocks: points, lines and planes. Recent studies have shed light on interesting mathematical patterns between these building blocks in paintings.

Artistic styles were analyzed through various statistical techniques such as fractal analysis1, the wavelet-based technique2, the multi-resolution hidden Markov method3, the Fisher kernel based approach4 and the sparse coding model5,6. Recently, these methods have also been applied to other cultural heritages such as literature7,8,9,10 and music11,12,13,14. Such quantitative analysis is called “stylometry,” which originates from literature analysis to identify characteristic literary style9.

In this study, we add a new dimension to the body of stylometry studies by analyzing a large-scale database of artistic paintings. With digital image processing techniques we quantify the change in variety of painted colors and their spatial structures over ten historical periods of western paintings – medieval, early renaissance, northern renaissance, high renaissance, mannerism, baroque, rococo, neoclassicism, romanticism and realism – starting from the 11th century to the mid-19th century. Digital images of the paintings were obtained from the Web Gallery of Art15, which is a searchable database for European paintings and sculptures consisting of over 29,000 pieces ranging from the years 1000 to 1850. Most of the identifiable images contain information of schools, periods and artists and are good quality in resolution to apply statistical analysis.

Here we focus on the following three quantities – the usage of each color, variety of painted colors and the roughness of the brightness of images. First, we count how often a certain color appears in a painting for each period. From the frequency histogram, we find a clear difference between classical paintings and photographs. Next, we measure a fractal dimension of painted colors for each period in a color space, which is analogically considered to reflect the color ‘palette’ of that period. Interestingly, the fractal dimension of the medieval period is lower than that of other periods. The detailed results and our inference are discussed in this section. Last, we consider how rough or smooth an image is in the sense of its brightness. In order to quantify roughness of brightness, a well-known roughness exponent measurement in statistical physics is applied. We find that the roughness exponent increases gradually over the 10 periods, which is consistent with the historical circumstances like the birth of the new painting techniques such as chiaroscuro and sfumato16,17 (Chiaroscuro and sfumato are major painting techniques developed and widely used during the Renaissance period. Literally, the compound word chiaroscuro is formed from the Italian words chiaro (light) and oscuro (dark), which refers to an artistic technique to delineate tonal contrasts and voluminous objects with a dramatic use of light. Precursors of chiaroscuro are Leonardo da Vinci (1452–1519) and Michelangelo Merisi da Caravaggio (1571–1610) and Rembrandt van Rijn (1606–1669) is a representative artist well-known for his use of chiaroscuro. The Italian word sfumato is derived from the Italian term fumo which literally means “smoke”. Leonardo da Vinci mentioned sfumato as a blending of colors without lines or borders, in the manner of smoke or beyond the focus plane. In other words, sfumato is a painting technique to express gradual fade-out between object and background avoiding harsh outlines.). Analyzing these three properties, we propose new approaches to quantitatively analyze a large scale database of paintings. Applying our method to the controversial Jackson Pollock's drip paintings, it is possible to infer that his drip paintings are quite different from works of other painters.

Results

Chromo-spectroscopy

First we investigate how many different kinds of color appear in a painting and how often a certain color is painted, which is similar to Zipf's plot for word frequencies in literature18. It is named as “chromo-spectroscopy.” A color is considered to be like a word for a painter. As an example of chromo-spectroscopy, Fig. 1a displays the fraction of each color used in a painting in descending rank order. If each color is chosen from a palette uniformly at random, the frequency of each color would follow a binomial distribution for a random process (see more detail in the supplement) and its rank plot would show an inverse of its cumulative, i.e., the regularized incomplete beta function19. This is because the rank plot is the inverse of its cumulative density function (see black dots in Fig. 1a). However, interestingly, the rank-ordered color-usage distribution (RCD) shows a long tail distribution, which is different from the inverse function of the regularized incomplete beta function (see Fig. 1a).

Figure 1
figure 1

Rank-ordered color-usage distributions for an image and periods.

(a) Fraction distribution of each color in a descending rank order for the art work of German painter Johann Erdmann Hummel (1769-1852), “Schloss Wilhelmshöhe with the Habichtswald” (This image is out of copyright.). The horizontal axis indicates the rank of a color in frequency and the vertical axis denotes the proportion of a color in an image. The most (least) used color is located at the leftmost (rightmost) position on the horizontal axis. The black dots represent color choices from the same palette uniformly at random. (b) Rank-ordered color-usage distributions (RCDs) of the 10 periods and photographs. Note that the distribution of photographs clearly shows a different tail. Inset: RCD for the neoclassicism period. The displayed color corresponds to its rank. Note that the fraction is normalized by the image size and the number of paintings in each period.

Figure 1b shows RCDs for 10 periods of European art history and photographs. The RCD of a period represents how many colors are used and how often a specific color appears during the period. All periods of painting show a universal distribution curve, but the rank of each color for each period is rather different. The RCD of photographs is similar to that of paintings at the beginning of a power-law part but the exponential tail deviates significantly from paintings, as shown in Fig. 1b. In order to clarify the difference of the tail section of RCDs between paintings and photographs, we analyze RCDs of images of photographs after applying several painting filters from popular software. There are clear changes in the tail of the distribution when only the oil painting filter is applied. An oil painting filter usually consists of two parameters – range and level – which are related to the size of an art paint brush and smearing intensity. It seems these two parameters influence the shape of the exponential tail of the RCD. Another interesting fact is that there is no clear difference between RCDs of photographs and hyper-realism paintings, which are extremely finely drawn with microscope and are hard to distinguish from photographs with unaided eyes (see Figure S4b in the supplement). This suggests that paintings are only quantitatively distinguished from photographs by the tail section of the RCD. The tail of RCD represents frequency of noisy colors or a level of details in the image.

Fractal pattern and color palette

RCDs for all periods of paintings show quite universal distribution curves. However, the most commonly painted color is different for each period. To characterize the variety of colors more quantitatively, while ignoring its individual frequency, we investigate the fractal pattern of the painted color in the RGB color space for each period.

To examine the fractal characteristics of painted colors for each period, we measure the box-counting dimension20 of the paintings in the RGB color space and compare them with two iconoclastic artists: Pieter Bruegel the Elder and Jackson Pollock. Each color used in the painting is plotted on a point in the RGB color space. Based on the definition of the box-counting dimension, we iteratively change the length of box ε from ε = 1 to ε = 32 and count the number of non-empty boxes. A non-empty box indicates that corresponding colors within the box are used in the painting at least once. If the distribution of colors in the color space is homogeneous, the box counting dimension is 3. In other words, if the box counting dimension is less than 3, the distributions in the color space is heterogeneous and fractal, which means some axes are preferred or the distribution is composed of a preferred color scheme in the color space. In this sense, measuring the box-counting dimension quantifies the spatial uniformity or fractality of painted colors for each artistic period.

Figure 2a shows that the box-counting dimensions of paintings from the 10 historic periods are in the range between 2.6 and 2.8 except for the medieval period. As Fig. 2b shows, only the box-counting dimension of the medieval period is close to that of Jackson Pollock's drip paintings (below 2.4), where he used limited colors intentionally. In addition, the box counting dimension for the paintings of Pieter Bruegel the Elder is approximately 2.55. A low box-counting dimension represents that there is a strong preference in a small number of selected colors in the medieval age. That is, the color palette in the medieval age is significantly different from the other periods.

Figure 2
figure 2

Box-counting dimension and its tendency.

(a) The results of box-counting dimension over the 10 artistic periods display a significant difference of the medieval period from the other periods. Error bars indicate the standard deviation. (b) The number of boxes to cover the color space versus box size. The fractal dimension in the color space of Jackson Pollock's drip paintings is measured around 2.35, similar to that of medieval paintings (see also Figure S5 in the supplement), but dissimilar to that of another iconoclastic artist Pieter Bruegel the Elder.

One can find the reason why the box counting dimensions for the medieval age and Jackson Pollock are different from others in the historical facts. First, specific rare pigments were preferred for political purposes and religious reasons in the medieval age despite their expensive cost. Second, no technique of physical mixing between different pure colors was used in that period due to the tendency to emphasize the purity of colors and materials themselves. Artists recoated on a colored canvas to represent various colors in the middle age. The drip paintings of Jackson Pollock are also formed from recoating each single color dripping pattern on other layers and the number of used colors is smaller than other western paintings before 20th century. Furthermore, oil colors and color mixing techniques were not fully developed until the Renaissance age. The introduction of new expression tools, like pastels and fingers and painting techniques, such as chiaroscuro and sfumato, made much more colorful and natural expressions possible after the Renaissance period21. The difference of fractal dimensions between the medieval and other periods quantitatively may quantitatively reflect the historical facts and the painting technical difference in art history.

Spatial renormalization and fixed point analysis

In the RGB color space, each painting has its own set of scattered color pixels. In order to analyze the characteristics of color usages, considering the variety of color in the paintings, we define three representative points in the RGB color space. First, center of usage frequency in the color space may be compared to center of mass in physics. One can calculate center of usage frequency (CM) in the color space with the usage information and spatial position of colors such as the center of mass of physical objects. Second, iteratively resizing a painting is necessary to get the fixed point of the painting borrowed from real space renormalization concept in physics. Repeatedly resizing a painting, a painting eventually becomes one pixel. That is the fixed point of the painting (FP). The third fixed point of the randomized painting (SFP) is the same as mentioned in the second one except for shuffling the pixels of the painting. If the spatial information of the scattered color is irrelevant, FP and SFP would not be significantly different. Note that center of mass point of a shuffled image (SCM) is the same as the original CM. Then, two vectors d1 (d2) pointing from CM to FP (SFP) can be compared to quantify the randomness of the spatial arrangement of the colors in paintings. If d1 and d2 are similar, the used colors in a painting are not diverse or the spatial arrangement of the colors in a painting is close to random. Figure 3c suggests that the color arrangement of Jackson Pollock's drip paintings is quite different from other paintings, showing that Pollock's art work is quite random, especially in the spatial arrangement of colors. On the other hand, the two fixed points of Pieter Bruegel the Elder's paintings are far away each other.

Figure 3
figure 3

Spatial renormalization of original and shuffled images.

(a) An example of transforming an image into a fixed point. (Figure 1a also contains the image which is out of copyright.) (b) An illustrative example of the center of mass (CM), the fixed point (FP) and the shuffled fixed point (SFP) in RGB color space. (c) Norm of difference of d1 and d2 over 10 periods and comparison with Pollock's drip paintings and Pieter Bruegel the Elder's paintings. (d) Norm of cross product of d1 and d2 over 10 periods and comparison with Pollock's drip paintings and Pieter Bruegel the Elder's paintings.

Surface roughness and brightness contrast

Though we mainly focus on the usage of colors, ignoring its spatial arrangement over the first two subsections, spatial correlation of colors is also important to understand the artistic style of the paintings, as shown in previous RG analysis, because a painting is a composition of colors in the proper place. The spatial arrangement of colors makes various artistic effects possible. For example, contrast, as one of the artistic effects, is an important element to express shape and space in two dimensional fine arts. Among various types of contrast, brightness contrast is the most important in art history due to the cultural background of Europe which usually adopts the contrast of light and darkness as a metaphorical expression. In this subsection, taking both the color information of pixels and their spatial arrangement into account, we examine the prevalence of brightness contrast in European paintings over 10 artistic periods.

To quantify brightness contrast, we utilize the two-point height difference correlation (HDC) and its roughness exponent α, the slope of HDC curve in a double logarithmic plot of the surface growth model in statistical physics22. First we get the brightness in grey-scale from the RGB color information through a weighted transformation (see Methods) and define a “brightness surface” of an image by adopting the brightness of a pixel as a height at that position of the image as shown in Fig. 4a and b. A three-dimensional surface, like a deep-pile carpet, is obtained from the 2-dimensional painting, where the HDC is calculated as a function of distance r. This method is widely used in condensed matter and statistical physics to analyze the roughness of a growing surface, for example a semiconductor surface grown by chemical deposition22. For comparison, a shuffled image, by changing a pixel's position randomly, is analyzed together.

Figure 4
figure 4

Constructing brightness surfaces and measuring roughness exponents.

(a) and (b) Illustrative examples of brightness surfaces. The brightness of each point is considered as its height. (c) An example of a two-point HDC function G(r) on the brightness surface of an image in the inset, a panel painting of Italian painter Taddeo Gaddi (1348–1353) titled “St John the Evangelist Drinking from the Poisoned Cup” (This image is out of copyright.). The horizontal axis indicates the distance r, where a unit is a pixel, between two distinct points on the surface. Red points show the HDC of an original image and blue ones represent that of a randomized image. The slope is approximately 2α~0.28. (d) The HDC function for an image shown in the inset, painting of American painter Jackson Pollock (1912–1956) titled “Number 20, 1948, 1948” (This image is reproduced by permission of the Artists Rights Society and Society of Artist's Copyright of Korea, © 2014 The Pollock-Krasner Foundation/ARS, NY - SACK, Seoul), showing no difference from a randomly shuffled image only except for short distance less than 10 pixels, which is less than 1% of the image width.

As shown in Fig. 4a and b, since the brightness of a point is defined as its height, the height difference between two points represents the brightness difference. The two-point HDC of a randomly shuffled painting is displayed in blue dots in Fig. 4c and d for comparison. The slope α for randomized images is 0 since there is no spatial correlation any more. Figure 4d shows an example of Jackson Pollock's drip painting, which is hard to distinguish from randomly shuffled painting when only the spatial correlation is considered. The roughness exponent of Jackson Pollock's drip painting is very small comparing to that of other European paintings.

Since HDC describes the spatial correlation between color pixels on a surface as a function of distance, the slope of the HDC function, i.e., the roughness exponent α, denotes the average brightness difference according to the contrast effect. Figure 5a shows that the roughness exponent α gradually increases over the 10 artistic periods, which is consistent with historical circumstances. First, the increasing tendency of α is related to changes in painting techniques and genres, such as from portraits to landscape. In the history of western art, many new painting techniques were developed and spread during the Renaissance period. For example, chiaroscuro, which is one of the canonical painting modes in the Renaissance period16, characterizes strong contrasts between light and shade. The roughness exponent and the HDC capture the level of brightness and relative spatial position. Hence, a roughness exponent α of a painting could be a quantitative indicator of a chiaroscuro technique and its increasing tendency over artistic periods reflects the spread of the chiaroscuro technique over the continent21. In addition, the Renaissance art movement led that painting genres became more diverse. Therefore, more portraits and landscape paintings were encouraged. Large objects in paintings such as a torso, i.e., the upper body of portraits, or mountains and sky in landscapes decrease the brightness difference in a short distance, but makes the increment of the HDC bigger as distance increases21. Therefore, the historical renovation of painting techniques and the diversification of painting genres are clearly captured in an increasing tendency of the roughness exponent α.

Figure 5
figure 5

The trend of roughness exponents and image entropies.

(a) The trend of roughness exponents over 10 art historical periods shows increasing behavior. (b) Statistical tendency of image entropy values of brightness surfaces over the periods; error bars indicate the standard error of the mean.

Another example, sfumato is another major painting mode developed in the Renaissance period to express a vanishing or shading around objects in a painting17. Smoothing the edges of objects in a painting makes the variance of brightness decrease because it doesn't allow abrupt changes at the boundary. In this case, image entropy23 would be a good measurement for the sfumato technique, which indicates the variance of brightness in a specific locale. Since the variance is inversely proportional to homogeneity, the image entropy describes the level of local homogeneity of brightness in a painting.

Figure 5b shows that the image entropy H increases up to Neoclassicism and then decreases, which is somewhat different from the roughness exponent since the image entropy only considers the complexity of the color gradient around a pixel locally comparing to the fact that the roughness exponent also consider the color brightness difference of remote distance. We think that the different behaviors of these two measures may reflect the tendency that the chiaroscuro technique is still developing but the sfumato declines. It may be rejecting mysterious expression and respecting the realistic one.

Discussion

From the analysis of a large-scale European painting image archive, we display that chromo-spectroscopy of 10 art historical periods shows a universal distribution curve which distinguishes art paintings from photographs. Additionally, fractal analysis allows us to rediscover the expansion of the color palette after the medieval period, which is consistent with the fact that the color palette of the medieval age was relatively narrow comparing to other periods because of historical circumstances. Furthermore, we measure the roughness exponent and image entropy of brightness surfaces over the 10 art historical periods. We find that these mathematical measurements quantitatively describe the birth of new painting techniques and their increasing use. Our approaches successfully provide quantitative indicators reflecting historical developments of artistic styles. Applying them, it is possible to deduce that the Jackson Pollock's drip paintings are not typical art work, of course, these are still controversial in the art world.

There are several limitations of our approaches and we provide suggestions for future works. First, although the database is quite large, our dataset does not cover all paintings of the 10 art historical periods. In this reason, it is possible that there exist sampling bias in our results which we have not yet figured out. For better statistics, analyzing much bigger (higher resolution) images such as the Google Art Project24 will give us more concrete insight for artistic style. Another possible error is unintended color distortion while converting original paintings into digital images, which may cause color information loss or bias. Even though we have checked that our results are not significantly changed from artificial color quality reductions, we could not follow all possible distortion effects. It is also true that present colors in the paintings are different from the original ones when they were completed. Old paintings are hard to preserve and usually suffer from degradation of physical materials of paintings such as oxidation and corrosion. These are big remaining issues not only for this study but also for all stylometric analyses in arts. Nonetheless, we expect that our quantitative study would be helpful to bridge the gap between art and science.

Methods

Source of dataset and statistics of paintings

In this study, we analyzed the digital images of European paintings in the Web Gallery of Art which exhibits artworks ranging from 11th century to mid-19th century15. The European paintings are classified into 10 art historical periods: medieval, early renaissance, northern renaissance, high renaissance, mannerism, baroque, rococo, neoclassicism, romanticism and realism. We filtered non-painting images, such as sculptures, miniatures, illustrations, architecture, pottery, glass paintings and wares. The number of refined images for each period is summarized in SI Table S1. In total we have analyzed 8,798 painting artworks. As shown in Fig. S1, over 94% of images are larger than 700 × 700 pixels and the largest one is 1350 × 1533. Therefore, the quality of the images is good enough to perform a statistical analysis. Furthermore, in order to discuss the difference between paintings and photographs, two more datasets are collected for hyper-realism and photographs. We collected 105 hyper-realism images from hyper-realism artists' web sites25,26,27,28,29,30,31, the largest one is 2974 × 1954 and the two sets of photographs from the official Instagram site of National Geographic32 and the online photo gallery of a Korean portal site33.

Box-counting dimensions

In order to investigate the fractal patterns of painted colors in the RGB color space, we measured box-counting dimensions20. The box-counting dimension is defined as the following:

where N(ε) is the number of non-empty boxes and the side length of each box is ε. A ε value represents the color quality in a digitized unit, for example, ε = 1 corresponds to 2563 possible colors in 24-bit RGB color system and ε = 32 is associated with 83 possible colors in 8-bit RGB color system. Each ε value corresponds to log2(256/ε)3-bit RGB color system. Changing ε = 32, 16, 8, 4, 2 and 1 (see Figure S6 in the supplement) and examining N(ε) for each ε, we measured dbox(ε).

Gray-scale transformation

To consider brightness surfaces of images, we converted digital color images into grayscale images using the following weighted filter:

where R, G and B are the red, green and blue intensities of a pixel and Igray-scale is the brightness of a certain color, which is interpreted as a height on the image. The reason for the difference in weighting values is due to the color sensitivity of a human eye34 and there exist several other weighting filters for R, G and B intensities for specific purposes. However, there was no significant difference in the results with different filters.

Two-point height difference correlation function

To measure the roughness exponents of brightness (height) surfaces, a two-point height difference correlation (HDC) function is calculated22. The definition is

which follows the simple scaling form, G(r) ~ r, for small r and where r is a distance between two pixel points, the over-bar represents the spatial average at a fixed distance r for all possible points, Nr is the number of possible pairs at a distance r, h(x) is the height at a point x (0 ≤ h(x) ≤ 255) and α is the roughness exponent. The roughness exponent was measured in a double-logarithmic plot of G versus r, where the fitting range was used from ra = 10 to rb, where the HDC saturates to the same value both for the original and randomized paintings. It approximately corresponds to 30% of the image width and a square root of 9% of the image area.

Image entropy

Entropy of a gray-scale image23, is given by the following equation:

where p(x) = h(x)/S, h(x) is the height at a point of the brightness surface (0 ≤ h(x) ≤ 255) and S is the sum of all height values in the image for normalization. A weighting factor m(x) is given by m(x) = 1+σ2(x), where the local height variance is calculated only over for its surrounding neighbor pixels and itself at a position x. Since this image entropy depends on an image size, all images are resized to 500 × 500 pixels by Lanczos algorithm before measuring the image entropy.