O'Reilly logo

Building Tag Clouds in Perl and PHP by Jim Bumgardner

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Magnifying the Long Tail (Inverse Power Mapping in Perl)

The uniformity of the font sizes I noted earlier is still a problem. The reason forthis is that the tag counts are arranged in a power curve (Figure 13). Power curvesare a very common phenomenon found in popularity or frequency data collectedfrom human activity.

A power curve

Figure 13. A power curve

There tends to be a very few large values in the data, and lots and lots of smallvalues. The problem with mapping a power curve to a limited set of font sizes isthat the "long tail" of the power curve ends up getting represented by just one ortwo font sizes. Many of the intermediate font sizes won't get used at all because ofthe larger gaps between the counts of the most popular words.

The way to make this tag cloud look better is to use a logarithmic function toreverse the power curve's effects. Essentially, we will map the linear range of fontvalues to the logarithmic range of tag counts, magnifying the differences betweensmaller counts and making the "long tail" of the power curve more visible (Figures 14 and 15).

Linear mapping of x to y

Figure 14. Linear mapping of x to y

Logarithmic mapping of x to y

Figure 15. Logarithmic mapping of x to y

To do this, we'll add a logarithmic measure of the tag counts: ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required