What combinations of search syntaxes will and will not fly in your Google search?
There was a time when you couldnât âmixâ Googleâs special syntaxes [Section 1.5]âyou were limited to one per query. And while Google released ever more powerful special syntaxes, not being able to combine them for their composite power stunted many a search.
This has since changed. While there remain some syntaxes that you just canât mix, there are plenty to combine in clever and powerful ways. A thoughtful combination can do wonders to narrow a search.
The antisocial syntaxes are the ones that wonât mix and should be used individually for maximum effect. If you try to use them with other syntaxes, you wonât get any results.
The syntaxes that request special informationâstocks: [Hack #18], rphonebook:
,
bphonebook:
, and phonebook: [Hack #17]âare all antisocial syntaxes.
You canât mix them and expect to get a reasonable
result.
The other antisocial syntax is the link:
syntax.
The link:
syntax shows you which pages have a link
to a specified URL. Wouldnât it be great if you
could specify what domains you wanted the pages to be from? Sorry,
you canât. The link:
syntax does
not mix.
For example, say you want to find out what pages link to
OâReilly & Associates, but you
donât want to include pages from the
.edu
domain. The query
link:www.oreilly.com
-site:edu
will not work, because the link:
syntax
doesnât mix with anything else. Well,
thatâs not quite correct. You will get results, but
theyâll be for the phrase âlink
www.oreilly.comâ from domains that are not
.edu
.
If you want to search for links and exclude the domain
.edu
, you have a couple of options. First, you
can scrape the list of results [Hack #44]
and sort it in a spreadsheet to remove
the .edu
domain results. If you want to try it
through Google, however, thereâs no command that
will absolutely work. This oneâs a good one to try:
inanchor:oreilly -inurl:oreilly -site:edu
This search looks for the word OâReilly in anchor
text, the text thatâs used to define links. It
excludes those pages that contain OâReilly in the
search result (e.g., oreilly.com). And, finally, it excludes those
pages that come from the .edu
domain.
But this type of search is nowhere approaching complete. It only
finds those links to OâReilly that include the
string oreillyâif someone creates a link like
<a
href="http://perl.oreilly.com/">Camel
Book</a>
, it wonât be found
by the query above. Furthermore, there are other domains that contain
the string oreilly, and possibly domains that link to oreilly that
contain the string oreilly but arenât oreilly.com.
You could alter the string slightly, to omit the oreilly.com site
itself, but not other sites containing the string oreilly:
inanchor:oreilly -site:oreilly.com -site:edu
But youâd still be including many OâReilly sites that arenât at OâReilly.com.
So what does mix? Pretty much everything else, but thereâs a right way and a wrong way to do it.
Donât mix syntaxes that will cancel each other out, such as:
site:ucla.edu -inurl:ucla
Here youâre saying you want all results to come from ucla.edu, but that site results should not have the string âuclaâ in the results. Obviously thatâs not going to result in very much.
Donât overuse single syntaxes, as in:
site:com site:edu
While you might think youâre asking for results from either
.com
or.edu
sites, what youâre actually saying is that site results should come from both simultaneously. Obviously a single result can come from only one domain. Take the exampleperl site:edu site:com
. This search will get you exactly zero results. Why? Because a result page cannot come from a.edu
domain and a.com
domain at the same time. If you want results from.edu
and.com
domains only, rephrase your search like this:perl (site:edu | site:com)
With the pipe character (
|
), youâre specifying that you want results to come either from the.edu
or the.com
domain.Donât use
allinurl:
orallintitle:
when mixing syntaxes. It takes a careful hand not to misuse these in a mixed search. Instead, stick toinurl:
orintitle:
. If you donât putallinurl:
in exactly the right place, youâll create odd search results. Letâs look at this example:allinurl:perl intitle:programming
At first glance it looks like youâre searching for the string âperlâ in the result URL, and the word âprogrammingâ in the title. And youâre right, this will work fine. But what happens if you move
allinurl:
to the right of the query?intitle:programming allinurl:perl
This wonât get any results. Stick to
inurl:
andintitle:
, which are much more forgiving of where you put them in a query.Donât use so many syntaxes that you get too narrow, like:
title:agriculture site:ucla.edu inurl:search
You might find that itâs too narrow to give you any useful results. If youâre trying to find something thatâs so specific that you think youâll need a narrow query, start by building a little bit of the query at a time. Say you want to find plant databases at UCLA. Instead of starting with the query:
title:plants site:ucla.edu inurl:database
Try something simpler:
databases plants site:ucla.edu
and then try adding syntaxes to keywords youâve already established in your search results:
intitle:plants databases site:ucla.edu
or:
intitle:database plants site:ucla.edu
If youâre trying to
narrow down search results, the intitle:
and
site:
syntaxes are your best bet.
For example, say you want to get an idea of what databases are offered by the state of Texas. Run this search:
intitle:search intitle:records site:tx.us
Youâll find 32 very targeted results. And of course, you can narrow down your search even more by adding keywords:
birth intitle:search intitle:records site:tx.us
It doesnât seem to matter if you put plain keywords at the beginning or the end of the search query; I put them at the beginning, because theyâre easier to keep up with.
The site:
syntax, unlike site syntaxes on other search
engines, allows you to get as general as a domain suffix (site:com)
or as specific as a domain or subdomain
(site:thomas.loc.gov
). So if
youâre looking for records in El Paso, you can use
this query:
intitle:records site:el-paso.tx.us
and youâll get seven results.
Sometimes youâll want to find a certain type of information, but you donât want to narrow by type. Instead, you want to narrow by theme of informationâsay you want help or a search engine. Thatâs when you need to search in the URL.
The inurl:
syntax will search for a string in the URL but
wonât count finding it within a larger URL. So, for
example, if you search for inurl:research
, Google
will not find pages from
researchbuzz.com,
but it would find pages from http://www.research-councils.ac.uk.
Say you want to find information on biology, with an emphasis on learning or assistance. Try:
intitle:biology inurl:help
This takes you to a manageable 162 results. The whole point is to get
a number of results that finds you what you need but
isnât so large as to be overwhelming. If you find
162 results overwhelming, you can easily add the
site:
syntax to the search and limit your results
to university sites:
intitle:biology inurl:help site:edu
But beware of using so many special syntaxes, as I mentioned above, that you detail yourself into no results at all.
Itâs possible that I could write down every possible syntax-mixing combination and briefly explain how they might be useful, but if I did that, Iâd have no room for the rest of the hacks in this book.
Experiment. Experiment a lot. Keep in mind constantly that most of these syntaxes do not stand alone, and you can get more done by combining them than by using them one at a time.
Depending on what kind of research you do, different patterns will
emerge over time. You may discover that focusing on only PDF
documents (filetype:pdf
) finds you the results you
need. You may discover that you should concentrate on specific file
types in specific domains (filetype:ppt
site:tompeters.com
). Mix up the syntaxes as many
ways as is relevant to your research and see what you get.
Get Google Hacks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.