Magnifying the Long Tail (Inverse Power Mapping in Perl)

The uniformity of the font sizes I noted earlier is still a problem. The reason forthis is that the tag counts are arranged in a power curve (Figure 13). Power curvesare a very common phenomenon found in popularity or frequency data collectedfrom human activity.

A power curve

Figure 13. A power curve

There tends to be a very few large values in the data, and lots and lots of smallvalues. The problem with mapping a power curve to a limited set of font sizes isthat the "long tail" of the power curve ends up getting represented by just one ortwo font sizes. Many of the intermediate font sizes won't get used at all because ofthe larger gaps between the counts of the most popular words.

The way to make this tag cloud look better is to use a logarithmic function toreverse the power curve's effects. Essentially, we will map the linear range of fontvalues to the logarithmic range of tag counts, magnifying the differences betweensmaller counts and making the "long tail" of the power curve more visible (Figures 14 and 15).

Linear mapping of x to y

Figure 14. Linear mapping of x to y

Logarithmic mapping of x to y

Figure 15. Logarithmic mapping of x to y

To do this, we'll add a logarithmic measure of the tag counts: ...

Get Building Tag Clouds in Perl and PHP now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.