Search This Blog

Thursday, August 22, 2013

SharePoint Search - What cannot be found?

SharePoint 2010 has awesome search capabilities! That being said, it still is SharePoint. You have got to remember that SharePoint is a platform. Platforms are build with greatness in mind.... so it is not a custom solution for your specific problem!

Like all platforms the search capabilities have their limits. If you know them you can respond to them. This blog is about what you can do with search and what you cannot do with search.

Consider the following scenario: Document Management
First a little story so we can pin point requirements... and some cannot's!
So you have your SharePoint environment all configured to work as Document Management System (DMS). How awesome is that!

So how does this DMS work?

You upload your documents to the drop-off library. You configured a content organizer rule (or 500 of them, not counting your Record Center) that makes sure that your document automagicly moves to the correct document library.

Your DMS also supports Document-ID's. So the document you created in your Dropp-Off Library gets a unique and one of kind ID.

Now, some business-joker wants to locate 'his' document. Geeees the nerve of some people. So, where is it? You uploaded one document? Well, give me 5 minutes and we'll have it located! But what if we are working in a reasonable sized company and we are processing 5.000 or more documents per month? That's about 170 documents per day.

SharePoint Search to the rescue! You have content organiser rules in place so you know your content (otherwise you cannot create good content organizer rules). Let's say the document that the user wants to locate is a complaint.

Try using the CQWP in such an environment :-) look at this blog about that hell! So you configured search to display all non-handled complaints and.... nothing?! WTF?

So why is that? Dunno :-) but perhaps it is because one of the following limits, boundries or other functionalities:
  • Documents in the drop-off library aren't crawled
  • Checked out documents aren't crawled
  • Search can only find what it crawled via an incremental or full crawl.
So what does this mean in real life scenario's:
  • The document was uploaded to the drop-off library but wasn't (yet) indexed so SharePoint does a check-out and the complaint will not be processed!
  • The document wasn't indexed correct at the drop-off library so SharePoint does a check-out and the complaint will not be processed! Documents in a drop-off library aren't visible for search and thus the complaint cannot be found by the end user!
  • The document was uploaded via an automated process with incorrect metadata and thus it is again checked out.
  • The document was uploaded via an automated process with correct metadata but the content organizer rules didn't run yet (scheduled daily by default).
  • The document was processed correct but
I mentioned the document-id feature earlier. When you create a document center there is a little box in wich you can enter a specific document-id. This box uses search capabilities so all the issues/functionalities mentioned above count for this little gem also!

Sources:

http://blogs.technet.com/b/speschka/archive/2009/10/30/sharepoint-2010-content-organizer-part-1-a-cool-new-feature-for-managing-your-content.aspx
http://blogs.technet.com/b/speschka/archive/2009/10/30/sharepoint-2010-content-organizer-part-3-the-rules-engine.aspx
http://groundhog667.blogspot.nl/2013/08/sharepoint-2010-content-query-web-part.html
http://roykimsharepoint.wordpress.com/2011/05/28/content-organizer-feature-for-large-site-hierarchies/

Information management - Deployment

This blog is all about Deployment of your solution!

First thing first. Where are we in the process? When we are ready for deployment we already covered (in one form or another):
- Information architecture - Done!
- Logical architecture - Done!
- Physical architecture - Done!
- Installation - Done!

Woohoo! So now we can implement our solution, finally!

You have several options for a SharePoint deployement:
- Manual
- Automated
- Mixed

Manual deployment
A manual deployment is exactly what it is, a manual deployment. You ask somebody to manually, a.k.a using the SharePoint user interface, to create the required objects, like:
  • Web apps
  • Site Collections
  • Sites
  • Content Types
  • Site Columns
  • Libraries
  • SharePoint Groups
  • Activate features
  • Configure (SharePoint) settings
  • etc.
Requirements for a manual deployment:
  • Accurate information architecture
  • Extremely organized and punctual consultant(s)
  • Test environment
  • Time!
One of the key take aways here is having a specific persona in your team. One that is able to replicate exactly what is written in the information architecture and is able to spot mistakes. I have been working with SharePoint more then 7 years and haven't seen a 100% accurate information architecture with matching logical architecture.

Another key aspect if this type of deployment is time. Manually creating and testing all objects can take days or even weeks! This, off course, depends on your implementation. Think about a DMS solution with 1-5 site collections, 40 content types, 20 or so document libraries and content organizer rules that move documents through your system.

Automated deployment
An automated deployment is a deployment where you use pre-configured scripts, compiled code to create the objects mentioned above.

You could create SharePoint features using Visual Studio that deploy webpart, content types, libraries, etc. Another way is by using Powershell scripts.

Requirements for an automated deployment:
  • Accurate information architecture
  • Development team
  • Development environment
  • Test environment
There is a new requirement here Development environment. I personally do not think it it neccesary in a manual deployment but this is very arguable, I agree!

The requirement time isn't on the list. You've been using that during the developement process. Building the features or scripts. Deployment should be a breeze....

Mixed deployment
Off course it is really feasible to use a mixed deployment, e.g.:
Creation of web applications, site collections, content databases, content types, site columns, libraries etc is done using powershell scripts.
Creation of custom webparts is done via custom developed features.
Content organizer rules are created manually.

Requirements for an mixed deployment:
  • Accurate information architecture
  • Development team
  • Development environment
  • Test environment
  • Extremely organized and punctual consultant(s)
  • (Time)
With a mixed deployment you need, off course, all the requirements above!

Wednesday, August 14, 2013

SharePoint 2010 Content Query Web Part demystified


Content Query Web Part
This blog is more about architecture & information management in a real world SharePoint environment. Since content aggregation is part of architecture & information management I decided to share my findings on the elusive Content Query Web Part a.k.a. CQWP.

The story
According to Microsoft the CQWP can be used to show aggregate content:
  • over a single list
  • over a site and all subsites
  • over a site collection
The story demystified
The above mentioned is accurate...although not complete.
Imagine this real world scenario. Contoso is a medium sized company with approx 25.000 documents. These documents reside in a single Site Collection and are spread over multiple document libraries. Off course we implemented a thorough information architecture and this all documents are neatly classified in terms of content type and metadata.

Now we have a business requirement: "I want to see all contracts assigned to me.".
Further, an architectural principle states only OOB functionality is allowed.

And we are off.... Let's skip a couple of steps to the point the CQWP is the way to go for this requirement (OOB it is... really... trust me... no you cannot use search).

The CQWP: What does it do and where can I find it?
The CQWP is an out of the box webpart that, as stated above, enables you to aggregate data. We need aggregation over multiple libraries and we need filtering. Before this little 'gem' makes it appearance you need to enable it. This is done via the activation of the publishing infrastructure feature.

Analysis
The contracts are stored in several different libraries. Perhaps due to shere numbers, authorization, etc.

Implementation
So we create a new page, add the CQWP.
Next part is the configuration. The 3 elements we are going to focus on here are:
  • Source
  • Filter
  • Sorting
First off the source. Since we are testing the solution we are setting the source to aggregate over a single list.
Content Type = Contract
The filter is DocumentManager (people picker field type) and it's value is set to '[Me]'.

Run the page and voila we have a result of 2 contracts (just an example). As a wise man once said: WOOHO! We have a(n almost) working solution!

Now, change the source of the filter to aggregate everything over the site collection and:
CRASH!


What just happened? Where is my nicely aggregated list of contracts?

CQWP internals
At this point we need to start debugging the solution. Review filter settings, (if you didn't already) enable trace logging, etc. etc.

As you perhaps already know the CQWP runs a CAML query that you probably want to analyze. Glyn Clough has written a nice blog on just how to do that! Off course we are using the ULS Viewer to analyze  the huge amounts of ULS logs created. Somewhere in there you will find something like:
"xxxxxxx* w3wp.exe (0x0C98) 0x16F0 Web Content Management Publishing 7352 Warning ...entTypeId" Nullable="True" Type="ContentTypeId"/><Value Type="ContentTypeId">0x0101</Value></BeginsWith><Eq><FieldRef ID="{f366697d-21a6-493e-af7d-9b3cf5410ea4}" Nullable="True" Type="User"/><Value Type="User"><UserID/></Value></Eq></And></Where><OrderBy></OrderBy></Query>' generated the following error:The attempted operation is prohibited because it exceeds the list view threshold enforced by the administrator. at the following url: XXXXX. Web Part title: Content Query b8210847-657f-40f3-80af-22bf444ad8f8"

The good, the bad and the ugly
The good part about this log is that it is extremely clear and straightforward! The CQWP is prohibited from displaying the results because it exceeds the set limitation of the list view threshold (LVT). As you can read in SharePoint 2010 capacity management the default value of the lvt is 5.000 items.

The bad part about this is that this setting isn't there because it looks really cool but it has a real purpose. Try cranking it up to, let's say, 30.000 and let half a dozen users try to access the page. You can actually see the performance penalty it causes.

The ugly part... now what? The CQWP should only return a couple of contracts, not even close to 5.000.

Indexed columns
As it turns out the columns you use with filtering AND sorting need to be indexed columns, according to Microsoft.

The ultimate catch
Try and add more than 5.000 items in a single document library, filter on a single column (using the CQWP) and give it a shot! When you use an indexed column as a filter it will return results. When you remove the indexed column from that specific list it will not return results and spit back the log mentioned above.

So, the solution is adding the filter column on all document libraries in the site collection and....CRASH! WTF? We did everything correct and still nothing.

"Statement" by Microsoft
(This is not the actual statement but a, valid, free interpretation)
The CQWP only uses the indexed columns when selecting a single list as source. The CQWP ignores the indexed columns when the source is anything else than a single list!

And that my friends is how the cookie crumbles!

Sources:
Designing large lists and maximizing list performance (SharePoint Server 2010)
Data in SharePoint 2010 – Part 2 – Content Query Web Part
SharePoint Server 2010 capacity management: Software boundaries and limits
Inspecting the caml of a content query web part

Sunday, July 28, 2013

SharePoint in the Real World - Information Management - Analysis

Especially when dealing with content management need to know about INFORMATION.

Don't believe me, but believe the MCP's of the world and believe all those (big) technical driven implementations that failed.

You really need to know your information and how it is organized:
  • What information will your intranet, DMS, BI solution, etc contain?
  • Who will be using it and how?
  • How do they access it (...and find)
  • When do they need it

Microsoft set up some starter worksheets that WILL help:
I suggest you use this information and implement in the mother of all enterprise applications: Excel (or even better in a database). Why?

[Sarcasme]
  • Maybe you want to modify your information structure at some point… where within our 1.000 sites did we use this site column? ;-)
  • What happens when we modify our taxonomy?
[/Sarcasme]

So again:

Analyse the information that you will put into your SharePoint environment. Write it down and publish it to the people involved.

Information Management - Findability & Data


Information is only usefull if users are able to find or use it (even if they don't even know they are using it - dashboard info for instance)!

Figure out what information you are going to, or is already, store(d) in your environment. Environment - not SharePoint. SharePoint is 'only' the enabler. Next thing you need to do is determine how your users want to find the information they need. Not how you (or your sponsers) want your users to find certain information.

So you need to determine:
 
Tools & Techniques
Okay so what tools and techniques does SharePoint offer OOB. With OOB I mean everything(!) excluding when you need to call the development team. Don't get scared SharePoint offers a lot like: rss, alert me, CQWP, filter web parts, search (esp 2010 offers a bunch of great new features) but also your navigation and plain and simpel browsing.

Data
Your user is in need of input!
Machine generated alternative text:
They need lots of it. Not because they get a kick out of it but simply because they need it to get the job done.
What kind of data does the average user require?
  • SharePoint data
    • Corporate
      • Legal documents
    • Department
      • Contracts, agreements, SLA's, product info, status reports, agenda's
    • Project
      • Tasks, milestones, estimates, todo's, results (past, current and estimates)
    • Team
      • Again tasks, documents, etc
    • Social
      • Who's who?, Who's available, availability
  • All content that lives in SharePoint
  • Fileshares
  • LOB
  • Any other data that is available via an API