Preface

Martin White

As you walk up Walton Street from the centre of Oxford the road bears slightly to the left and a large 19th century building comes into view. It is not an Oxford college but the headquarters of the Oxford University Press. OUP is the largest university press in the world, and can date its origins back to around 1480. In 1983 I arrived at this building carrying a Texas Silent 700 terminal. This used thermal ink printer technology and had two rubber ears on the top into which a telephone handset could be inserted to link the printer into the BT public telephone network through an acoustic coupler. A decade earlier I had used the same technology to use the first computer-based search services developed by the Lockheed Corporation and System Development Corporation.

I was heading up early attempts by Reed Publishing to develop electronically published products and services, notably airline flight timetables. Reed owned International Computaprint Corporation, based in Fort Washington, PA, which specialized in keyboarding and printing telephone directories and airline timetables. Reed had been working with IBM and the University of Waterloo, Canada on the New Oxford English Dictionary (NOED) project, which was to create a digital version of the Oxford English Dictionary. The OED seeks not only to provide a definitive definition of a word, but also the origins of when the word was first used, with examples of subsequent use which may have modified the definition. All these examples were contained on around 4 million slips of paper.

The proof of concept was to digitize the one of the Supplements to the First Edition, starting at the letter S. The digitization and indexing had now been completed and I, together with Hans Nickel, the founder and CEO of ICC, were about to demonstrate what we had achieved to the NOED project team led by Tim Benbow and Edmund Weiner. Many of the lexicographers were skeptical of the value of the project, and there was a mixture of expectation and disinterest around the table.

With the terminal we set up a connection (at 300 baud!) to the computer in Fort Washington. I can still remember the first question, which came from one of the more skeptical lexicographers, who wanted to know how many words in the OED originated in the Times newspaper. Because all the text had been marked up in Standard Generalized MarkUp (SGML) language (a forerunner of XML) we could identify the source, and not only provide a count but print out (albeit very slowly) all the examples. There was a short period of silence and then these distinguished scholars suddenly realized the potential of information retrieval. They also recognised that it was not going to put them out of a job but enable them to improve the value of the product. Many more queries were undertaken and the session only came to an end when we ran out of supplies of thermal paper.

The NOED project was a great success, not only for the OUP but also for Dr Gaston Gonnet and his team at University of Waterloo. This team became the nucleus of Open Text Corporation. IBM used the knowledge gained from the project in the development of its search technology as the OED files provided a rich source of syntax information to help with query development.

For me it was a day of discovery about the power of search to discover new relationships between items of information. I learned three important lessons from this project. The first of these was the value of metadata structure in searching. Because of the way that the individual elements of the entries had been marked up in SGML it was easy to search for words that had first been used by Charles Dickens after his return from his first visit to the United States in 1842. The second lesson was gained in listening to the members of the project team from IBM and the University of Waterloo as they talked about the importance of computers being able to understand the structure of sentences, work that would lead to the development of semantic search technologies. The third lesson was in understanding the impact that search could have on organizational processes and outputs.

Almost three decades on from that visit to Oxford I am still fascinated and frustrated by the technology of search and the process of searching. In many respects we have not come all that far from the technology I was using in 1974. Google’s PageRank is not far removed from Dr. Gene Garfield’s development of citation indexes in 1960 and the concepts of recall and precision emerged from research carried out by Cyril Cleverdon at the Cranfield Institute of Technology, UK, in the mid-1960s. The mathematics of vector-space indexing was developed by Dr. Gerald Salton at Cornell University.in 1975 and Dr. Michael Lynch founded Autonomy Ltd. in 1996.

Enterprise search is now moving from a ‘nice to have’ to a ‘need to have’ application as organizations struggle to find the information they need to make good business decisions. Not only is more information being created but nothing is being thrown away. Search technology is a mixture of the mathematical management of probability and computational linguistics but this book is not about technology. It is about meeting the expectations of users by investing the skills and experience needed to manage the technology. Whether you are a business manager, IT manager or information professional I hope that when you finish this book you will set up a meeting with your HR Manager and start the process of staffing up your search support team before any further investment in technology.

As you read this book I hope you find what you are looking for

How to Use This Book

This book has been written to help business managers, and the IT teams supporting them, understand why effective enterprise-wide search is essential in any organization, and how to go about the process of meeting user requirements. This could be by improving the existing search application(s) or by specifying and implementing a new search application. Search technology is not easy to understand without a good background in applied mathematics or information science. This book has just two chapters out of twelve on search technology, with the objective of providing just enough detail to understand the possibilities offered by enterprise search and the software available on commercial and open-source terms.

A good place to start might be Chapter 12 on critical success factors. If you are not able to meet at least eight of the twelve success factors then you really do need to read this book.

Chapter 1 and Chapter 2 set the scene, explaining why effective enterprise search is essential to any organization. Over the last couple of years a number of surveys have been published which show that most organizations are finding it increasingly difficult to find information that has been created at some considerable cost in terms of staff time. It is not just that the volume of information being created as increased but that low storage costs mean that nothing is now thrown away. The user research techniques described in Chapter 3 may well come up with some uncomfortable outcomes as you may find that your colleagues are reduced to emailing around the organization to find the information they need to make business-critical decisions. Chapter 4 considers the elements of an enterprise search strategy, highlighting the importance of allocating an adequate level of staffing to the support of search. An organization with more than 1000 employees probably needs a search support team of two people, and above around 10,000 employees this will double.

Chapter 5 and Chapter 6 provide an outline of the technology of search. The search functionality described in Chapter 5 is the base-level technology that can be delivered by virtually all search applications and then Chapter 6 offers an overview of functionality that will often mark out differences between various search software options. The search business is not a large one. There are perhaps no more than 100 vendors globally and the structure of the industry and the challenges it faces are discussed in Chapter 7.

If the result of the user research and business planning is that a new search application is required then Chapter 8 and Chapter 9 cover the process of defining the business and search requirements, the evaluation of commercial and open-source software and the management of the installation and implementation.

If you only have time to read one chapter please read Chapter 10. The reason for the well-documented lack of satisfaction with a search application is that organizations invest in technology but not staff with the expertise and experience to gain the best possible return on the investment through reviewing search logs and monitoring changes in user requirements. Finally Chapter 11 gives an overview of some of the current directions in search development.

There is one topic that is not covered in this book, and that is the design of search user interfaces. This is a very important topic in its own right and many excellent books have been, and are being, written.

The book concludes with a list of books and blogs on information retrieval and enterprise search, lists of search vendors and search integrators and a glossary.

Safari® Books Online

Note

Safari Books Online (www.safaribooksonline.com) is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://oreil.ly/Enterprise-Search.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

I could not have written this book without the generous support of many colleagues over quite a number of years. In particular my good friend Miles Kehoe (New Idea Engineering) read through every line of the book at quite short notice and made many invaluable comments and suggestions. Over the years I have learned a great deal about the search business and search technology from Miles and his business partner Mark Bennett and it is a great shame that Miles and his team are eight time zones away from Horsham. Despite the diligent work that Miles has undertaken the responsibility for errors and omissions is mine alone.

Stephen Arnold, my co-author for Successful Enterprise Search Management, has been a constant source of insight into the technologies and business of search for over a decade. Charlie Hull (Flax) has patiently educated me about open-source search implementation and Valentin Richter (Raytion) has done the same for the implementation of commercial enterprise search applications.

Information Today Inc. have given me the opportunity to participate in the Enterprise Search Summit and Enterprise Search Summit Fall conferences in the USA and supported my ambition to establish Enterprise Search Europe in 2011. Also at Information Today Michelle Manafy and then Theresa Cramer allowed me to voice my opinions on search in a long-running Eureka column in e-Content Magazine.

In 2011 the Institute for Prospective Technological Studies, Joint Research Centre, European Commission awarded me a contract to undertake a techno-economic study of the enterprise search market in the EU. The research undertaken during the project has been of great value in writing this book and I am grateful to Dr Stavri Nikolov, the IPTS project manager, for his support and insight throughout the project.

Tony Byrne (Real Story Group) has been a constant support to me for over a decade in helping me understand search from both a vendor and user perspective, and generously allowed me to use the RSG enterprise search glossary as the basis for the glossary in this book.

Other colleagues whose contributions in various ways have shaped my understanding of search technology and search good practice include Denise Bedford, Jed Cawthorne, Paul Clough, Mike Davis, Susan Farrell, Susan Feldman, David Hawking, John E. Hall, Cathy Hein, Jane McConnell, Elizabeth Marsh, Kristian Norling, Howard McQueen, Peter Morville, Lynda Moulton, Matt Mullen, Mike O’Donoghue, Leslie Owen, Alan Pelz-Sharpe, Avi Rappoport, James Robertson, Lou Rosenfeld and Mark Vadgama. Helen Carley at Facet Publishing and Steve Newton at Galatea were the publishers of two previous books on search. Dr. David James and his colleagues at the Royal Society of Chemistry have been a pleasure to work my role as Chair of the RSC e-Content Committee.

Janus Boye (JBoye), Kurt Kragh Sørensen (IntraTeam), Jakob Nielsen (Nielsen Norman Group) and Erik Hartman have given me many opportunities to run search workshops at their events, and these have been an invaluable opportunity to learn from the experiences of enterprise search managers.

Over the last decade I have carried out many enterprise search consulting assignments but I am not in a position to list the organizations involved. Each of these assignments has given me additional insights into the technology and use of enterprise search.

I would like to thank Simon St. Laurent and Meghan Blanchette at O’Reilly Media for their support in bringing this book from an initial idea to a published book in just ten months. It is a privilege to be an O’Reilly author. In my book Intranet Management Handbook, published in 2011, I announced that I would not be writing any more books. This is an e-book. It doesn’t count!

It has not been easy for my wife Cynthia when people ask her what I do for a living. Being an information scientist is fascinating for me but difficult for Cynthia to describe. She has been immensely supportive during eleven career changes and eight books. Our son Simon manages my IT requirements including the design and support of the Intranet Focus web site and I must mention his wife Andrea.

During the course of this book our first grandchild was born to Nick and Andrea. (We have two Andreas and two Dr. Whites in our family!) Noah had a very difficult start to his life but is now progressing very well. This book is dedicated to him so that when he begins his career in a very digital world he can say that it was his grandfather who wrote the first e-book on enterprise search. You will come across his name again in this book.

Get Enterprise Search now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.