How should a “Perfect” Search project be run?

What follows is a post that I recently published on AIIM’s site as an “Expert Blogger”. (The original can be read here)

———————————————————————–

How should a “Perfect” Search project be run?

It was Friday evening, and Charlie was meeting his friends for a drink. They all worked in IT and had, between them, years of experience, especially in the area of enterprises and enterprise search, and liked to get together to catch up with what each was doing.

After a few pints and small talk, Charlie said “Guys, what do you all reckon would be the best way to construct a large-scale enterprise search project?”

Martin, who had had quite a lot of experience in this area, looked up and said “The main thing is that you shouldn’t underestimate what is required to get the best from a search investment.”

Charlie nodded in agreement. “But how can we help the client understand what sort of a commitment is needed?”

Ken suggested using an Agile/Scrum approach for the analysis of what the client needed as well as the development of the search UI.

“Hear hear” called out the others. Otis took the chance to follow that up with “you need someone who really understands what search is all about”. Martin glanced at him, and nodded. Otis carried on. “Someone who cares about search metrics, and knows what changes need to be made to improve them.”

Jan chimed in “I agree with you on some points. You‘ve got to make sure that you include all the stakeholders, and also educate the customer. Get everyone in the same room, and start with a big picture, narrowing it down to what is actually required. And, yes, create demo’s of the search system using “real data”. It helps the customer understand the solution better.” “However,” he continued. “I’m still careful about forcing a Scrum approach on a customer that might be unfamiliar with it.”

Stephanus put down his glass. “I’ve just finished a Phase I implementation at a client. The critical thing is to make sure you is that you set the client’s expectations and get buy-in from their technical people. Especially in security and surfacing. And I agree with Jan. There are still a lot of companies that don’t use Agile, or Scrum, at the moment.”

Sitting next to Stephanus was Helge. He began to speak. “There are a few important things. Make sure you’ve got Ambassadors – people who really care, and promote, the project. And ask the important question – ‘How can the search solution support the business so that they can become more competitive?’ It might be necessary to tackle this department by department. Get the business users and content owners together, but as Stephanus just said, don’t forget IT. And also make sure that the governance of the system is considered.

Stephanus smiled. “Yes – the workshop idea is a definite must.”

Gaston, who was sitting next to Charlie, said “An Agile approach has worked for me in the past. Creating prototypes is important. Most clients don’t know what they want until they see something tangible.” “Ok” said Charlie, “how has that worked?”

Gaston continued “Build a small team consisting of  a UI designer, a developer, a search engineer, someone from the IA team, and no more than two of the business users. Having someone there from QA is also handy. Start with a couple of couple of day long workshops to go over project objectives, scoping and requirements gathering. Use one week sprints, and then aim to produce workable prototypes. At the end of the week, schedule a time where the prototype can be demo’d. The point is to get feedback about what is working, and what the goal for the next sprint should be.

Mike, the last one in the group, looked around at everyone, and then back at Charlie, and said. “Charlie – there’s a lot of great advice here. One important thing to remember is that you have to work with the client to ensure that the search solution is part of the strategy. As the others have already mentioned, work with the client and educate them. Getting all the stakeholders together for some common education, collaboration and planning can really go a long ways towards getting the necessary buy-in and commitment needed for a successful project. It also is great for setting expectations and making sure everyone is on the same page.”

Charlie was impressed. He had some pretty smart friends. “Thanks guys. You’ve all had some excellent points. Let me buy you all another round”.

The above “conversation” was all based on a discussion in LinkedIn. (Click here to read it).
Many thanks to the contributors in that discussion who graciously allowed me to write this post:

Why giving the users what they want is not enough – the Importance of communication

What follows is a post that I published on AIIM’s site as an “Expert Blogger”. (The original can be read here)

———————————————————————–

Why giving the users what they want is not enough – the Importance of communication

As you are all most likely aware, giving the users what they want is not the right thing. Why? Because, often, the users don’t know really what they want.

Consider the following example:

A large restaurant chain has restaurants across the globe. Each restaurant needs to maintain documentation such as construction plans of each restaurant, recipes, procedures, and methodologies, etc. The “critical” documents are kept in a legacy ECM system and several SharePoint doclibs store the non-critical documents. These systems are located centrally, and are all globally accessible.

The business users work primarily with the legacy ECM system, but often also need to work with the documents in SharePoint. When a document was needed, a search was either done in SharePoint, or in the legacy system, using its rather complicated search feature.

Performing searches in two different places wasn’t easy, or efficient. And so, the users cried out “Give us a one central place where we can perform a search” When asked for more details they business users replied “Make it like Google”.

The restaurant’s IT-people (who might have been a little too enthusiastic) swung into action, without anymore questions. They found a tool that would allow SharePoint to “talk” with the legacy ECM system and crawl all the documents, indexing everything it could.

After working many weeks getting things set up, and configured, the IT-people sat and watched as SharePoint crawled through the content. Once finished, initial tests were done to ensure that a search action would actually return content. It was working perfectly. And it was “just like Google”.

A demonstration of the Search system was given to the users, who were ecstatic. They were able to easily enter search terms, and get results from the SharePoint, doclibs as well as the legacy system’s repositories. It was fantastic. It was easy to use, and there was no extensive training required. There was much cheering and showering the IT-people with small gifts. After further testing, the search facility was officially moved into production.

For the first couple of month the users were keen to use the “enterprise search facility”. But then, gradually, complaints started being heard. “The search results contained too many hits”, “Why wasn’t it more like the search feature in the legacy system?”, or “the search results were just showing the title of the document.” Users went back to using the legacy system’s search feature for the “important” documents, and the SharePoint search was just used for the documents in the document libraries. Namely, the “central” search facility was a failure.

What had gone wrong here? The business users wanted a single search facility, and they wanted it “like Google”. And that’s what the IT department had delivered – there was a single box where users could type in words they wanted find. And the search would return documents from all the different document repositories.

In this case, however, the users didn’t really know what they wanted. Yes, they wanted “easy”, but they also wanted something that allowed granular searches to be done (just like their “old” search tool). They also wanted to know where the search results came from. And they wanted the “important” documents to appear at the top of the search results.

The IT team should have asked more, and then they should have listened more. And then they should have repeated this process. Until it was understood what the Business really needed.  The team had followed a Waterfall approach, where requirements were asked up front, and then were not allowed to change. Agile programming techniques could have been used where a “finished’ product is shown to the users several times during the project. The users could give feedback which would lead to a better understanding of what they want, as well as the ability to refine the solution.

Fortunately, the IT team had the opportunity to improve the search system. They did add a small button to the search result screen, where users could provide immediate feedback. Working with this, as well as sending out regular “satisfaction” questionnaires, the IT team was able to identify areas of improvement. These include not only changes that were required on the user interface, and results screen, but it also allowed the IT team to see where further refinements were needed in the indexing process. Every four months, the improvements were presented to the business, and then implemented.

Now, the business users don’t use anything else.


Is True Enterprise Search actually possible?

What follows is the first post that I published on AIIM’s site as an “Expert Blogger”. (The original can be read here)

———————————————————————–

The idea of “Enterprise Search” is an attractive one. It certainly would be its weight in gold to have a single search location where key words can be entered, and within a fraction of a  nanocentury[1], results would be displayed that include both structured, and unstructured, content from across the numerous repositories, silos, systems, archives, file shares, cabinets, clouds, etc, etc.

But is true Enterprise Search really possible? I know there are several tools that provide “Enterprise Search” functionality, but these usually allow you to search over a fixed number of different repositories, usually containing similar data. Maybe it’s a set of defined documents, or a database, or similar. You certainly get the opportunity to make available content from disparate sources, but can you consider that “enterprise”.

If you consider what’s involved to search across the “Enterprise”, it should be quite easy, right?

Well…consider this:

1. First off, you need to be able to identify where your structured, and unstructured, data and content is. Remember, here we are dealing with the complete enterprise, so don’t forget that this includes files shares, hard drives, database system, ERP systems, ECM systems, etc, etc. And what happens if new “sources” are added?

2. Next, you need to know what sort of content you have. Can the Enterprise Search application “read”, or parse, the data/content you have? There certainly are ways to make it possible to do this. You can install an ifilter, for example. But, you’ll need one for every format that you have in your enterprise.

3. You need a way that your Search application can connect to all of the different “sources”. In principle this is, again, possible. (However, I would imagine that this would require a lot of configuration).

4. How frequently is your data, and content, changing? For example, in an ECM system, is the content constantly being changed (as new documents are added). Maybe several major and minor versions are kept of each document. Do you need to index all versions, or only the latest? What about data in your ERP system? How accurate do you want your search results to be? Do you just keep continuously indexing?

5. Security. Do you want users to be able to see results of data, or content, that, if they had used the native application, they do not have rights to? If there are disparate security systems in place, how do you translate ACLs from them into a common format? Do you use “early binding”, or “late-binding”?  

As you can see, it’s not that simple.

Until we have a way to be able to “capture” all information from an undefined number of sources, with an undefined number of data, and file, formats, with disparate sets of ACLs, I return to my opening question: “Is True Enterprise Search actually possible?”

What are your thoughts on this?


[1] A nanocentury is approximately 3.155 seconds

ECM Noir – Killa Hertz & The Case of the Missing Documents – Part 7

…continued from Part 6 –  [Other Episodes]

Killa Hertz’ friend Mike Budrewski had analysed the eResults logs, and had determined that their was a Java memory error. Killa was investigating further.

“Trudy – I need to look at the web server.” Trudy looked up with a puppy dog look in her eyes. She quickly opened up a new remote session, and logged me onto the web server. “OK”, I said to myself, “somehow this thing is throwing a memory error.” I fired up the task manager. The thing was using a little more memory than normal, but it looked OK.

Suddenly my mobile phone rang. Trudy jumped. The girl was skittery. I answered the phone, and heard Mike’s voice. “Killa, I was able to find some documentation about this eResults application. There’s nothing explicit about the error, but it clearly states  that it requires 2Gigabytes RAM. How much is that thing running?” “1 gig” I replied. I was happy. It looked like an open and  closed case. At the same time I was annoyed. Why the hell was a law firm skimping on things like memory?

I looked over at Trudy – she was busy staring at numbers in a spreadsheet. “Trudy – this server doesn’t have enough memory. What you can you do to get another gig installed?”. She looked up. She wrinkled her nose, and trundled her chair next to mine. Her perfume was clearly set on “Kill” this morning..

“Umm, let’s have a look.” The web service server was a virtual one. That meant that, in principle, it should be easy to increase the memory. “Yes” she said, in that excited voice of hers. “However, I’ll have to let the boss know. We’ll probably need to take the server off-line.” I went and grabbed a coffee while she called her boss.

After 5 minutes, she came into the coffee room. “Sure thing Killa, we can do it tonight.” I put down my cup. “Trudy – how long does it take to crawl all the documents in the docbase?” “Well…” she started. “The last time we did it, it took about a week.” Suddenly, I had the urge to be sitting on a stool at O’Learys with a glass of Jack.

“A week is a long time to see if this is going to work.” Even though I was getting paid by the day, there were still limits.

“Let’s see if we can split up the load.” Trudy’s eyes opened wide. She was a good kid. “Look – you’ve got over 800,000 documents in there. We’ll split up the documents into smaller groups. Then we start a crawl on each group of documents. If this memory increase doesn’t help, and a crawl doesn’t work properly, then it doesn’t mean we have to recrawl all the documents.”

Trudy ran over to her desk, and grabbed a pad of paper (it had roses in the corner of each pad) and a pen. “Let me get this down” she said.

“Ok Trudy, let me show you what needs to be done.”

to be continued…

Part 8

ECM Noir – Killa Hertz & The Case of the Missing Documents – Part 6

…continued from Part 5 –  [All Episodes]

Killa Hertz had worked through the night with help from Trudy. They had gone through the indexing process. It looked like the answer could be in the eResults log. Killa had sent it to his super-geek friend Mike to see if he could make sense of it.

The alarm clock went off at 8am. Swinging my arm I knocked the thing off the bedside table. Being electric, it just keep beeping. I pull the plug out of the wall.

After leaving Trudy’s office last night, I made a phone call. My friend Mike was awake. I expected that. He liked his internet games. I swung past his place with the CD. Trudy had made sure that there was only the eResults log on it.

Mike invited me to stay while he analysed the log.  His flat was small, and messy, and there was no bourbon. I declined. “Mike – call me in the morning when you have an answer.”

So now – it was morning. Still hot, and as sticky as it was last night. After swallowing two cups of coffee, I headed into Trudy’s office. She was there looking at the system. “Hi Killa!” she squeaked far too enthusiastically. I hate morning people. “Have you heard anything?”. I told her that Mike would call me as soon as he had news for me.

“But you know Trudy, it could be that the system is choking while it’s doing the indexing. Let’s have another look at it.”

She logged onto the system for me, and then let me sit in her chair. I had a look at the Crawler Impact Rules in SharePoint. There were none. I poked around and checked out a few other things. There system was 32 bit. Not the best, but didn’t explain why the crawls were suddenly stopping short. There were a few settings in the registry that could be tweaked to increase the amount of memory used. But, again, no point changing those…yet. I made note of them anyway.

Around 9:30, my cell phone rang. It was Mike. He wanted me to come around.

Knocking on his door, I was met by Mike in the same clothes that he had on the last time I saw him. He was talking fast. Clearly a sign of too many caffeine-loaded energy drinks. I didn’t want to be around when those wore off.

Mike pulled a stool over next to his chair. The computer screen was filled with the error logs. “I looked through the logs, Killa. There’s a hellova lot of information in there. I went through each line. This is a smart app.” I could hear that Mike was impressed. “There are a lot of errors, but they are nothing to be worried about. It looks like the system is just reporting that it couldn’t find certain things. These don’t look like they are causing the crawl to fail. I double-checked them anyway. It took me awhile, but about an hour ago I think I finally pinned it down”

I glanced at Mike. He liked his moment of importance. “So what do ya think it is?”, I asked. Mike continued “Memory” he said.”But their SharePoint system is running fine” I said. “No – not the SharePoint server – it’s a Java error.”

“I need coffee” I said. His response was to thrust a can of energy drink in my hand. It was better than nothing.

I thought back over the process. SharePoint indexed the docs. But that didn’t use Java. The documents were transferred in batches from the Documentum docbase to the SharePoint server first. And this was via a web server that did use Java.

“Mike – I’ve gotta go check something. I’ll call you.” Mike handed me a pile of paper. It was a printout of the error log with the Java error highlighted. “As always – Thanks”.

I arrived back at Trudy’s office. “Trudy – give me access to your web server.”

to be continued…

Part 7

  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager: set DedicatedFilterProcessMemoryQuota = 200000000 Decimal
  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager: set FilterProcessMemoryQuota = 200000000 Decimal

Google, 1997, Pagerank & Barrels

Did you know Google originally used barrels?

Here’s an overview of Google’s system architect (at least as it was in 1997)

Can you see the barrels?

Messrs Sergey Brin and Lawrence Page presented a paper at a conference back in 1997. According to this, once a page has been indexed, the word occurrences are stored in “barrels”.

More about these barrels, and the page rank algorithm, and how Google does what it does (at least in 1997) can be read here:  The Anatomy of a Large-Scale Hypertextual Web Search Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine