Monday, January 7, 2008

Podcast- Introduction to WSS 3.0, part 1

My apologies to Richard, this should have been done sooner, but with the contract gig I am on and the holidays-- time ran away with me.

This podcast contains some introductory information about WSS 3.0. Covering topics such as:

  • what WSS 3.0 is,
  • what the difference is, especially in terms of licensing, between WSS and MOSS,
  • some foundational knowledge about the services WSS uses under the hood,
  • the basic concepts of the difference between basic and server farm installs,
  • why sharepoint uses web applications, service accounts, and more.

    To access the podcast just click on the title of this entry.

    Richard, I also wanted you to know that I was startled during the recording of the podcast and accidentally stood up and yanked the laptop off the table by the headset cord, and consequently, the pitch of the recording did change. Please let me know if it is seriously noticeable-- because I will be bummed that I damaged my laptop usb connector/headset.

    Tomorrow I have a conference call or two, and will be building slides, but feel free to email me with comments and/or questions.

    Thanks for listening and giving me reasons to do podcasts. : ) And for those of you stopping by and checking out the 'cast, feel free to comment as well. Just please be gentle, these podcasts are being created as time permits between work and sleep, and for free.

    Maybe later, if these things turn out to be useful, I'll invest some money into it. ; )

    -callahan

Saturday, January 5, 2008

And now for something completely different-- Searching PDFs, or Using Adobe's PDF IFilter with WSS 3.0 sp1

(Actually it doesn't matter about the version of WSS 3.0 (whether or sp1 is applied), what matters is the pdf ifilter. )

You may have installed WSS 3.0 on a server or two and had no problems with Search. It works great, indexing content on a tidy schedule with no mishaps-- users can search for anything in a site collection, from announcements to the contents of big Word documents. But, when the users try to get fancy and introduce PDFs to the mix, things get tricky because WSS doesn't search the content of those.

You see, WSS 3.0 can search standard Windows File types (meaning Office file types and text files mostly). Supposedly it can also, out of the box, identify characters in OCR'd TIF files. However, it cannot search PDF files.

Why? Because MS only offers index filter files (files that teach the WSS indexing service how to gather data ) for their file types.

However, you could get an index filter file (called an ifilter) by downloading it from adobe if you wanted to be able to index PDF file contents.

And that worked for versions 5 and 6, but when version 8.0 came out, things changed. You see, you used to be able to download the older ifilters straight from Adobe, but suddenly, you can't for version 8.0.

Why? Because the ifilter file is now bundled with Adobe Reader. So to get the ifilter, you have to install Adobe Reader (8.0 or higher) onto the WSS server that will be doing indexing.

However, not a lot of people know that. Which is why there are people in the public newsgroups having problems indexing their pdf files in WSS 3.0 (or higher). So either people don't realize that they need a pdf related ifilter or people do download and use the older Adobe PDF ifilter files, thinking that'll do it.

But people who download the older ifilters find they can only index PDF files of that version and lower, not non-adobe pdfs or higher versions. To index older version, non-pdf, and newest versions of PDF, you need the newest version of the ifilter (currently 8.1.1)

Of course, a company called FoxIt capitalized on people's confusion about getting and using ifilters by offering their PDF IFilter-- for a pretty penny of course.

But Adobe is still offering their ifilter for free. The only price you pay is having to install Adobe Reader on the server. And if that is too steep, then, well, it's good to know now before going any further.

There are a few tricks to getting the ifilter to work with WSS. Basically WSS needs to know it's there and what extension to use it on, so a few registry changes are in order.

Namely you will have to edit the registry to add the PDF file type to the Extensions List for WSS search, and to map the extension to a particular ifilter.

To do that go to regedit (go to Start Menu>Run> type regedit, hit Enter). Once in the registry, open the key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\{ANYGUID}\Gather\Search\Extensions\ExtensionList

The extension list is full of all the extensions that the indexing service (the Gatherer as it were) should recognize, listed as the string values of consecutive numbers, containing value data that indicates the extension.

To add PDF to the list, you simply find the highest number in the list (it goes in order, 1,2, 3, 4... up to the last of them), add a String Value that is the next higher value (so if the highest value was 37 for example, the string value you would add is 38), and enter "pdf" for the Value Data.












Then go to the next key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension

Here should be listed the file extensions with a CLSID (class ID) for the ifilter used to index the extension. If .pdf is not listed, add it (it should have a multi string value). In that multi-string value, you need to add the CLSID for the ifilter added by Adobe Reader. This file is called, for version 8.1.1. "AcroRDIF.dll." You can look up it's CLSID by doing a Find under the CLSID key (under HKEY_CLASSES_ROOT).

[[ Edited to add-- the CLSID that you are looking for to do filtering is under HKEY_CLASSES_ROOT\CLSID\ --I mention this because there are other CLSIDs that relate to other functions for acrobat in the registry. The one we need has the Reg SZ value of "PDF Filter".]]

Or, because, conveniently, the CLSID is posted on the web in several places because it is the same for version 8.1.1. as it is for the 8.0 version of Adobe Reader, you can just type it in:

{E8978DA6-047F-4E3D-9C78-CDBE46041603}

Mind you, if you are using a different, newer version, this CLSID may not work and therefore you'll need to find out the ifilter file name for your version and then search for it in the CLSID key in the registry. I find using the true filesystem path and the file name is better than just using the file name, sometimes the dll can be listed in a few places. You may need to experiment.

Anyway, I digress-- Once you've either found or added the .pdf key, enter the CLSID for the value (be sure to include the fancy brackets).












To let the server know where the Adobe Reader executable and its associated files are, add its path to the evironment variables of the server.

(Start Menu>right click My Computer>select Properties>go to the Advanced tab>click on the Environmental Variables button and scroll down the Path variable>select it and click on the Edit button> and add the path ";C:\Program Files\Adobe\Reader 8.0\Reader" (be sure to use the correct version if you are using something newer than 8.x)>then click OK to apply and close)

Finally, to let WSS know that it needs to index PDF files now, you can do one of two things:

1) reboot the server (seems to always work, but may not be possible in your environment). Instinctively, I guess because I am old school, I reboot when making changes to the registry.

OR

2) First stop and restart the Windows SharePoint Search Service (at the command prompt, use "net stop spsearch" then "net start spssearch." And then force the index service (if you don't want to wait for it to index on its preset schedule) to do a fullcrawl by, using STSADM:

stsadm -o spsearch -action fullcrawlstop

(again that's a bit of instinct there, I am figuring if it happens to be in the middle of a crawl I want it to stop and start over using the new ifilter)

stsadm -o spsearch -action fullcrawlstart

Remember that the stsadm command is in the C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN folder (if it isn't already set as a an enviromental path variable).

Regardless of which you choose, it may take some time for the pdf files to be indexed properly. I have found the second option to not be as guaranteed to work as simply rebooting and waiting for it to index on its own. But, no matter how long it initially takes, I found this free and relatively easy solution to indexing PDF files with WSS 3.0 to work, every time.

Thursday, January 3, 2008

Errata on Mastering WSS 3.0

The book is not yet out and I am finding little things that bother me about the it.

Now mind you, I did not write the book alone. I had numerous editors, both to proof the words and to check the technical accuracy. To add to the madness, I had several coauthors, individuals with their own way of saying things.

Together though, that can create some chaos. Let me demonstrate.

  • Chapter 14-- For some reason Ron, who had to use the SharePoint installer, SharePoint.exe, to install WSS 3.0; called it "SharePointServices.exe." I don't know why.

  • Folders-- folders were mentioned in chapters 5 and 6, but, believe it or not, were going to get serious coverage in chapter 11; which is the chapter about permissions, and is where I feel folders come into their own. However, because of one thing or another, the entire "A good reason to use Folders" sidebar (a good sized chunk of text I might add) was omitted. I will be adding it as an article here for those who might be looking for it.

  • Content Types-- I had wanted to do more with content types, I mean, c'mon, I covered connected web parts didn't I? But, due to page constraints I made due with something of a cop-out by covering it to add a template in a library. I will be doing things with content types here that I intended to do in the book. Mind you, I was given to believe there were several other books by Wiley that would cover content types, but as I have never seen them, I am not going to depend on that.

  • Sharepoint Designer-- There were supposed to be three chapters about using SharePoint Designer (SD) to customize WSS 3.0 as an IT admin, but they had to be cut out due to time, coauthor issues, and page limits.

  • Chapter 11 (written by bill chapman, reworked by charles firth)-- "site groups can be made at the subsite level"-- while technically true could be worded to be more true.~~ I mean that yes, you can use the new group button to add a group and use it at a subsite level if it is not inheriting permissions, but that group is actually in the list for the site collection as if it were created at the top level site. -- That's the point of the groups list for the whole site collection-- to list all the groups in the site collection. That includes those subsites that were created from the start not to inherit and therefore have their own groups. So actually, even if a site group is made at the subsite, it might appear as if it were made at the site collection level. Site groups are just named groups that could contain people, and then the permissions used at a site collection or lower level are applied to the group. So groups can be made anywhere in the site collection hierarchy, it doesn't really matter where. The difference is, there are no permissions being applied to the group at any level but the site that made it (modified it or is using it). See, that's what is really implied with "permission inheritance." ...well, more on that, in detail, later in this blog.

  • Server 2008-- Because the book is printing so, super, incredibly late, server 2008 will be out by the time it hits the shelves. The book was entirely written on server 2003. So what does that mean? Well, I could have stopped production and scrapped the book altogether (it was late enough to cancel at this point... one more delay and bam, right in the kisser), but I found that only certain features are different. So instead of scrapping the book, I decided to publish as is, and instead I'll write the changes here.

Those changes are, namely:

  • -- IIS 7.0 management. Oh, it says that it supports IIS 6.0, but not really.
  • ----IIS Web Sites and Application Pools are backed up and restored differently.
  • --Reliability and Performance Monitor.
  • ----So creating alerts and logs are going to be different.

I will be rewriting those sections to reflect the step by steps necessary to do those tasks in 2008. As a matter of fact, if I can get time to install camtasia, I'll just demo them. However, keep in mind that, out of a thousand pages, there were very few big differences.

I had to make the decision as to whether to publish or not, and decided to go for it. Please forgive me if that doesn't work for you, and email me to let me know why and what can be done to fix it.

Please, feel free to comment here with other errata (or use the errata email address listed in the errata widget near the top right of this page). I worked really, really hard to make sure everything I said was correct, but as with any project that has many chiefs, unexpected errors can happen. I am serious. I am depending on your to let me know if anything in there is misworded, misnamed, or misleading. I really don't like the idea that anyone might be misinformed by any written material, and always I want to fix it, if possible.

More notes on what I find that might be improved as I find them...


Edited to add More Errata (July 2008):

I was glancing through the book, trying to get ideas for sessions at the EMEA TechEd conference in Barcelona, when I happened to notice a sidebar (pages 347, 348, and 349) called "seeing is Believing." Disregarding the bad layout of the sidebar starting at the very bottom of the initial page, and the fact that the figures could have been shrunk a little to better fit the sidebar, the second figure in the sidebar is incorrect. The sidebar contains two copies of the same picture, when the first one should indicate the library with the "shareadmin" login, and the second picture, which in this book is a repeat, should show "saffron" logged in and the document in question should be missing. Unfortunately, the sidebar makes far less sense with two copies of the same picture than it would if the second picture were correct.

You have my sympathy and apologies if this made no sense to you too. The copy I sent in had the correct pictures listed. It appears that something was lost in translation...


Ah-- found another problem-- on page 251, in the "How to Avoid Blank Date Values, the figure it refers to in "You might have noticed in figure 5.38..." should be figure 5.39. Figure 5.39 displays the list with a record that has a blank Expires field. 5.38 is a picture of an email being used to add an item to the Announcements list.

Since the template the publisher required could not do automatic number, every sequential set of numbers (like figures, step by step lists, etc.) had to manually entered, and updated, every single time there was a change.

Yes, that is painfully ineffective (imagine dealing with it for a thousand pages), but that's the publishing biz for you. So during edits, a figure was added, removed, or moved around, without changing the reference in the text, causing the error. My apologies.