Search Techniques
Professionals looking for intelligence on the Internet nd the effort fruit-
ful, if not always easy. The information provided in this and the next two
chapters assumes that the reader has some familiarity with personal com-
puters and surng the Internet. Here, the reader will receive pointers from
those who rely on the Internet for open-source information on a daily
basis. The bibliography contains reference volumes where a reader can
nd a complete introduction to the Internet and its exploitation.
On the Internet, most Web pages are written in Hypertext Markup
Language (HTML),
which allows the creation of, and access to, struc-
tured documents (including text, pictures, and other content), but many
sites also have scripts and features in programming languages such as
and effects provided by Flash Player
and other plug-ins.
Also appearing on the Internet are text in American Standard Code for
Information Interchange (ASCII),
gateways into databases, and dynamic
content pulled into the browser in a combination of static and continuously
changing windows. Web sites’ displays of stock market updates, news
headlines, moving pictures, live videos, and other content are designed to
attract and entertain users, call attention to advertising, and communicate
in more sophisticated ways than static, two-dimensional documents. The
analyst needs to look beyond the ash and nd the facts, then capture the
Web page contents needed (at least the text containing the facts found).
Fortunately for searchers, expertise in the Internet’s structural compo-
nents is not needed for nding intelligence. However, it is important to
remember that the programming behind everything we see on the Web
is responsible for how it is displayed, and we should capture content by
using appropriate tools, to ensure professional information collection and
retention. Because the content of Web sites can change frequently, it is
imperative for an analyst/investigator to capture the content properly at
the time of the search.
Every Internet page has a Uniform Resource Locator (URL) address
(i.e., what you see in the box at the top of the browser), which is translated
by a Domain Name System (DNS) server into the correct Internet Protocol
(IP) address, to which the browser is directed.
Visiting an Internet page
allows the browser to pull its content (via a stream of packets) into a com-
puter and onto a screen that the user can peruse. Each link on which the
user clicks takes the browser to a new Web page. Before clicking on a link,
the user must decide whether it is a likely source of useful information. As
Internet information collectors become more familiar with sources, they
initially assess the potential authority, accuracy, and reliability of refer-
ences by the URLs of Web sites found in search tools. Further, it is often
necessary to trace the authors and owners of Web sites in order to col-
lect and analyze information and nd leads on those hosting and posting
on the Internet (more on that in Chapter 16). Therefore, it is important to
understand the basics of URLs, IP addresses, and their roles.
The many types of Internet content can be daunting to analysts, as it is
necessary to nd ways to identify, lter, capture, evaluate, and report the
text, photos, videos, audio, and other content about a subject of interest.
Generally, the content is in digital format, so it can be copied and led by
the investigator, and digitally signed if necessary to preserve its integrity
as evidence.
The systems and tools available provide a solid start, but
there is still no substitute for a trained, experienced, knowledgeable, and
creative analyst, who understands how to look for, nd, assess, and report
what is needed. Contrary to popular belief, there is no application—not
even Google—that can easily nd everything you are looking for.
Google and other search engines operate on Web sites that interact with
the user’s browser.
Professional investigators should become familiar with
the functions of their browser when a search is conducted, and may wish

Get Internet Searches for Vetting, Investigations, and Open-Source Intelligence now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.