Chapter 1. Searching the Enterprise

It seemed like a normal day when you arrived at work and turned on your computer. Then, the phone rang. Colleagues of yours were just about to go into a meeting with a prospective customer, and they needed details about custom software they had proposed installing. You went to search for those details, and they weren’t in the standard specification sheet, nor were they in the release notes, nor were they in any of the first fifty results your company’s search tool produced.

Every Day Is a Decision Day

We have to make many decisions every day. Each of those decisions required enough information to make the decision as risk-free as possible. In many cases, though, we probably did not have the time needed to find all the relevant information. We probably prided ourselves on being good enough managers not to need information; our experience enabled us to make the decision!

Every day, however, people make the news headlines because they made the wrong decision. The financial meltdown on 2008 was arguably an information problem. Loans had been made to people purchasing homes without adequate security. The pressures of making sales targets led to an inadequate review of the circumstances of the people asking for loans and senior managers in the banks had no information about the scale of the problem. While your decisions may not result in you making the news, a failure to make the best decision possible on the basis of the best available information could be bad news for your career.

Once upon a time you could at least walk into your office in the morning and feel reasonably certain about the decisions you would need to make. With the arrival of 24/7 mobile access, reductions in staff, and difficult economic and market conditions you may well get a call at any time during the day from a colleagues just about to walk into a sales opportunity who have just realized that they did not have a critical piece of information about the client or the proposal they were making.

That puts the pressure on to find information that could have a very positive impact on the bottom line. Fortunately your company has invested in an enterprise search application, so you enter a few keywords into the search box, sit back and within a few seconds you discover that the company either seems to have no information at all on the query you have made, or you find that there are over 3000 items of information and you only have a few minutes to provide a response to your colleague.

When we are dealing with decisions that are based on some standard business processes, such as setting up a project or writing a monthly report, then we often rely on browsing through the information architecture of an intranet, shared file collection or a document management system to find the information we need.

Search becomes critical when there is time pressure and a need for an immediate solution. We expect it to be as easy to use as Google and at least as effective, providing us with the information we need on the first page of results. Anything less, and the search application is regarded as a failure. Google or Bing have huge scale and an immense amount of development has gone in to providing search experiences across the Web. Searching for information inside a single company seems like it should be easier, but often isn’t.

This book will provide you with enough information to understand how enterprise search works, to help you in the decision of choosing the right solution for your company, and then getting the best return on the investment in the technology and the people who are responsible for making sure search works.

Information as a Corporate Asset

Many companies attach asset numbers to all of their property, be it a wastepaper bin or a complex machine tool. All those assets are logged in a database and their residual financial value will be given on the balance sheet of the company. The balance sheet will also show the financial assets of the business.

No matter how hard you look there will be two corporate assets missing from the balance sheet. One of these is the employees, though at least there will be a record in the Annual Report, of how many employees there are, possibly categorized by location or gender. But what about the information assets of the business, and the knowledge assets possessed by each employee no matter what their age or grade? International accounting standards do not allow for information to be capitalized as an asset because there is no definitive way of calculating its value. The value of a piece of information is unique to an individual at a particular point in time. In search terms it has a different ‘relevance’ and there is much more to say about relevance later in this book.

Not only is every physical asset recorded by the company, but someone will own the asset and make a decision about when and how it should be replaced or upgraded. In most companies no-one owns information as a corporate asset, even though there may be someone with the title of Chief Information Officer. There is now a growing concern by senior managers about the sheer scale of corporate information resources with the arrival of the concept of Big Data. With hundreds of applications being used each day inside even a modest-sized company the amount of data and information that is being collected is often poorly understood. Worse, because of the low cost of storage nothing is every deleted, so the rate of growth is a combination of new information and old information, with the assumption that all information has a value. It may do if it can be found!

The term ‘unstructured information’ is widely used to describe documents, emails, blogs and other text information, and more recently to rich media applications. In fact this information does have a structure, in that there is usually a title, an author, a date, and perhaps section headings and tables. The term came into use to distinguish these categories of information from ‘structured’ databases where data is stored in defined fields such as Address Line1, Address Line 2, Town etc. For many years the UK search vendor Autonomy made much of the fact that unstructured information represented 80% of the total information assets of the organization. No evidence was ever presented for this assertion, which seemed to be based solely on the Pareto principle. More important there is no relationship between volume and value.

Until recently enterprise search was used primarily to search unstructured text and so had to be able to cope with the issues of language and semantics.

Take these two sentences:

Noah loaded boxes into the van.
Noah loaded the van with boxes.

In the case of the first sentence the number of boxes could be any number of two upwards. In the second sentence there is the implicit message that the van was totally full of boxes, though we cannot be sure.

The textual differences between the sentences are very small but semantically very important. In almost every conversation we have we are constantly checking whether we have fully understood what others are saying, perhaps asking for clarification from time to time. In the case of a document that might have been written several years ago we cannot have this type of conversation, and yet we expect a search engine to be able to read and understand the document, and then be able to say with certainty that the document contains information that is relevant to the search we have carried out and list it in the top few results.

The Information Paradox

One of the outcomes of the Big Data movement is that at last there are some metrics for the scale of corporate data and information resources. Over the last couple of years that research has started to be undertaken into the way in which corporate information is being managed and the level of investment in enterprise search. These surveys have been carried out by MarkLogic, Symantec, Smartlogic, Findwise, AIIM, and Oracle.

Quotes from some of these studies include:

In a global survey of 1,375 subscribers conducted in January 2010, 85% of respondents said that information is a key strategic asset, yet only 36% said their organizations are currently well positioned to use information to help grow their business. The disparity at the upper end of the scale was even more dramatic; while almost half—45%—strongly agreed that information is a key strategic asset, only 7% believed they are very well positioned to exploit it. This research study overall makes it clear that making the transition to an information-based economy is not easy. Executives know information is a key strategic asset, that managing it well will provide real value and competitive advantage, but they are not sure how to do that, and there is a certain ambivalence about the role of IT.

Harvard Business Review Analytic Services

40% of respondents say that management at their organizations is either only “slightly” aware or not aware at all of the extent to which unstructured data exists in their enterprises. This lack of management awareness means it may be difficult for data managers to secure the funding and resources needed to properly secure, store, and fully leverage the large volumes of unstructured data coming into their organizations.

MarkLogic (2011)

A majority of respondents report that unstructured data is an essential part of their business, meaning that it may be a component of services or products offered to customers or constituents. At least 57% indicate that unstructured data plays an “extremely” or “very” important role in their businesses. About one out of five, or 18%, consider unstructured data to be at the core of their business.

MarkLogic (2011)

Enterprise search is falling far short of expectations, according to a survey of more than 2,000 directors and managers in the US, UK, Germany and France. More than half (52%) of respondents say they cannot find the information they are seeking using their own organization’s enterprise search facility within what most define as an acceptable amount of time. Nearly two-thirds of those surveyed (65%) define a ‘good search’ as taking less than two minutes to find what they were looking for, but only 48% report being able to achieve that result in their own organization.

Smartlogic (2011)

The Enterprise Search and Findability survey has shown that the majority of the respondents find it difficult to find relevant information within the organization. To be more precise, 60% of the respondents expressed that it is very/moderately hard to find the right information. Only 11% stated that it is fairly easy to search for information and as few as 3% consider it very easy to find the desirable information. The ease of finding the right information clearly has a connection with the size of the organization. When looking at organizations with less than 1000 employees, one can see that 31% of the respondents feel that it is moderately/very hard to find the right information, while the corresponding percentage for organizations with 1001 or more employees is 77%. Nearly half of the respondents (44%) were mostly or very dissatisfied with their search application.

Findwise (2012)

93% of executives believe their organization is losing revenue as a result of not being able to fully leverage the information they collect. On average, they estimate this lost opportunity to be 14% of annual revenue.

Oracle (2012)

Overall the view is that information is becoming more important, more difficult to find, and no one wants to take a lead position in improving the situation.

Before you read on, where would you place your organization in the spectrum of being adequately prepared for information abundance? Knowing the scale of the problem is the first step in finding a solution.

It is time for some definitions. This book is entitled Enterprise Search so what is ‘enterprise search’? Here’s one possible definition:

An enterprise search application enables employees to find all the information that the company possesses without the need to know where the information is stored.

The position I take in this book is that enterprise search is not about selecting and installing a single search application that will index every item of information and data owned by the organization.

In my view enterprise search is about creating a managed search environment that enables employees to find the information they need to achieve organizational and/or personal objectives. It should also include site search for corporate web sites. It should also include site search for corporate web sites.

Many companies already have one or more search application, either operating as a discrete search application or embedded into another enterprise application. Trying to replace all of these with one HAL-like enterprise search application is not a realistic strategy. Replacing some of them might be.

Search and Information Retrieval

Information retrieval can be regarded as the science (largely mathematics) behind search. It is a branch of information science and dates back to the mid 1950s. It has been defined as follows:

Information retrieval deals with the representation, storage, organization of and access to information items such as documents, Web pages, online catalogues, structured and semi-structured records and multimedia objects.

There are two different perspectives of information retrieval research. One perspective considers the computer technology of information retrieval, such as ways of building efficient indexes and finding ways to handle multiple languages. The second perspective is user-based, looking at search user interfaces and how people go about constructing search queries. Although there are some very distinguished university departments of information science around the world (many now called iSchools) few teach information retrieval in any depth as an undergraduate course and this means that the annual output of graduates with skills in search implementation is very low indeed. Computer science departments, of which there are many more, also pay little attention to the science and technology of enterprise search. It should also be noted that many of the major IT vendors, such as IBM, Oracle, HP and Microsoft, have a long history of carrying out information retrieval research, as of course does Google.

The scale of the science behind search can be seen in the fact that the standard textbook on information retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto runs to 760 pages of text and almost 2000 references in the bibliography. The definitive book on the design of search user interfaces by Marti Hearst is over 300 pages in length and has around 500 references in the bibliography to research papers on user interface design.

Sadly there seems to be a gulf between the information retrieval community and the enterprise search community. Some of the information retrieval conferences do have a session where papers from the commercial search world are presented. The situation is now starting to change and in the future much closer ties are likely to develop between the information retrieval community and search software vendors and users.

Search Is a Dialog

Earlier in this chapter I remarked on how in conversations we are constantly engaged in a dialog to ensure that we understand what the people we are talking with are trying to convey. It is very important to understand that search is a dialog. We tend to see search as a ‘first strike’ application; just putting a search term into Google or Bing provides all that the search application and we need to deliver a page of useful results. The reality is that even on Google we are sometimes prompted on spelling or asked ‘did you mean’. On the left hand side of the page there may be filters that we can use to narrow down our search, and on public search sites there will be paid-for advertising that also offers solutions to our problems.

We often go into a large department store to find a birthday present, and yet I have never come across a store with a Birthday Present Department. We may look at the store directory (the information architecture) for ideas, but if we are in a hurry we may also go to the Information Desk (the query box) for advice. There we will be asked the age and gender of the person for whom we are buying a present, and what their interests are, in order to suggest one or more specific departments we might wish to explore. Once in the Sports Goods section we may have another conversation with a floor manager about which is the best set of soccer goalkeeping gloves.

The challenge with search, as is the case for the staff of the Information Desk, is that every user is different, with their own individual perceptions of what would make a good birthday present and what would represent value for money. In the business environment is challenge is to find a way of meeting the individual expectations of each member of staff without having to provide them with their own individual search application. Indeed the aim is to make them think that indeed it does work just the way they want it to. Enterprise search is a constant battle between providing personal power at a price that the company can afford.

Search Has to Be Managed

For over a decade I have been providing consulting services in management of intranets, and one of the most common issues is who should be taking responsibility for intranet development and operation in a company. An intranet, like search, is a very high touch application, with most if not all employees using the intranet every day. The information on an intranet will be authored by most departments in the company but clearly the people managing the application need to report to a manager who has the budget to support the intranet. The end result is that intranet can be owned by Corporate Communications, HR, IT, or even Marketing on the basis that an intranet is just another web site.

In the final analysis it should not matter who owns search, and the same situation applies to an intranet. Both should be managed within an overall information management policy and an information management strategy, but very rarely are. Some years ago I went to run an intranet workshop for a major UK organization for which the effective management of information was probably its main competitive advantage. When I arrived I noticed that all the cars were reverse parked, and it looked very neat and tidy. It transpired that the organization was concerned about safety and at the end of the day did not want staff reversing out of a parking space and either crashing into another car or staff walking to their cars. The parking policy was published on the intranet and at the reception desk and it was made clear that a very dim view would be taken if the policy was not followed.

However this organization did not have any policies about the management of information, so almost every document was written in a different format, often with no owner or even a date on the document. The quality of the search experience is directly related to the quality of the content. The old adage of Garbage In – Garbage Out applies to search more than any other application. Someone has to take responsibility for information quality within an overall information management strategy. This is ideally written around an information life-cycle, of which the following is just one example. The use of the term ‘document’ is just a convenience and could be any item of information from a personal profile to a video file. The following is an example of an information life-cycle:

1. Create

This is the process of creating documents in a way that enables the document to progress through the stages of the information lifecycle. These might include document categories, writing good titles and adding metadata. There could also be a quality assurance process.

2. Store

There are many places that documents could be stored. These might include local and shared drives, document management applications, Lotus Notes applications and intranets. A set of criteria needs to be established so that employees know where documents should be stored so that they can be located and accessed by any employee with permission to do so.

3. Discover

Information can be found by searching through repositories, browsing through folder structures and intranet navigation and through alerting services such as wikis and blogs. Each has a role to play in the discovery process. The process can be facilitated by good usability and the design of intuitive lists.

4. Use

Documents can be used only subject to rules on confidentiality, security and personal privacy. These rules and guidelines need to be established.

5. Share

To be of benefit to an organisation documents have to be able to be shared internally, with third parties and with the public. Users of these document have to be confident that the information they contain can be trusted to be reliable, and that if needed the documents are available in a number of different languages.

6. Review

As documents are shared others may have views on the accuracy and value of the document. Processes need to be agreed for undertaking the review process and if needed creating a new version of the document. A possible decision could be that the document is disposed of to prevent inadvertent use at some time in the future.

7. Record

Some documents will need to be retained in a secure environment for an agreed period of time. Details of retention periods need to be agreed which take into account legal and regulatory requirements, and product and service lifetimes.

8. Dispose

Disposal is the final stage of the information lifecycle and it is the point the document has no further value to the company and can be deleted from all systems without any risk to the future integrity of the company.

Why Search Is Important

The biggest single challenge that any search manager faces is making a business case for a level of investment in search that is appropriate to the requirements of the company. Although the process of making a business case is covered in Chapter 7 let me end this chapter with seven of the main business benefits of good enterprise search.

Capitalizing on information investment

Every day most employees will have spent time on creating information; everything from writing a business plan, sending an email or reporting on a visit to a customer. The process of creation may well be of the order of an hour a day, or 12% of the working year. If this information cannot be found and used by other employees then that time has been wasted twice over, as other employees may have had to create the information all over again. There is also information from external sources, such as market research reports, that has been purchased and will have a company-wide value beyond the original purchaser.

Without enterprise search how much time and effort will be wasted? The outcomes of the 2012 Oracle study quoted above were that 93% of executives believe their organization is losing revenue as a result of not being able to fully leverage the information they collect. On average, they estimate this lost opportunity to be 14% of annual revenue.

Reactive to business opportunities

At a time when business growth is static finding new business opportunities is of the highest importance. When an opportunity does arise the speed with which the company can find examples of relevant experience or size the market potential could make all the difference between winning the business and being a poor second.

If a business opportunity arrived on your desk today how quickly could you respond with a proposal that had low risk and a good financial margin? An enterprise search application could reduce the research time from days to hours, if not minutes.

Making the best use of staff expertise

It is important not to focus just on information but also on knowledge. Knowledge cannot be written down as it is context-specific and changes day-by-day as new knowledge is gained. Typically companies have employee turnover rates of 10% a year. In a company with 5000 employees that means that on average every working day two people arrive at the company to build their careers and enable the company to meet its objectives. How do employees find out who knows what

How certain can you be that you know every employee that has expertise that would be of value to you? Enterprise search can play a major role in finding them.

Bringing new staff on board more quickly

New employees want to make a positive contribution as quickly as possible. They do not have the time or the inclination to work through the navigation of the intranet or the folder structure in the document management system, nor do they know the names of people that might be useful to them as they begin work.

Employees taking on new roles and responsibilities will be in just the same position, but possibly with a greater need to get up to speed as the expectation will be that they know exactly where all the relevant information will be located. If only!

Speeding the process of acquisition

One of the most significant benefits of enterprise search is that once the deal has been done employees in the acquired company need to have immediate access to the information resources and employee knowledgebase of their new employer. In addition business case for the acquisition will have been based on the skills and knowledge that the acquired business will bring.

In those crucial early days enterprise search can make a substantial contribution to the rapid and successful integration of the acquired company by quickly indexing the information resources of the acquired company.

Supporting mobile workers

Many of these employees will be working outside of the office, dealing with customers, prospects and suppliers. They will need information as the meeting is taking place to confirm the details of a product or the name of a subject-matter expert in the company.

Mobile users will use enterprise search on their smartphone or tablet to find information on a close-to-instantaneous basis and close the deal.

Reducing workplace stress

Routine tasks are rarely routine. New policies emerge and new forms are devised to capture information. Of course what is a routine task for a long-serving employee is not routine for someone new to the company or the role. In both cases there never seems to be enough time to complete the tasks.

Embedding search into a task can ensure that as the task is undertaken the most recent information is presented to the employee by the enterprise search application working in the background as a search-based application.


All the evidence suggests that organizations are ill-prepared for the rate of growth of information they are experiencing. Because information is not seen as an information asset, with an associated information management strategy, organizations have no view on the scale of the problem. As a result no one is taking ownership of the problem because ‘there is no problem’. According to a survey carried out by Oracle the result is that there could be a 14% loss of revenues for the corporate sector. Seeing ‘enterprise search’ as the quest for a single search application that can index all organizational information is not the solution. Enterprise search is about creating a managed search environment that enables employees to find the information they need to achieve organizational and/or personal objectives. There will be many different business cases that need to be addressed within this managed search environment, each contributing to the overall investment case.

Further Reading

A list of books and blogs can be found in Appendix A at the end of this book with a section pertinent to this chapter in the Further Reading section.

Get Enterprise Search now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.