SharePoint Document Management - Folder Discontent

A large number of organizations still live in shared folder hell, plagued by the following issues:

  • Documents are scattered deep and wide in a vast hierarchy of folders across many shared folder locations.

  • Duplicate content is everywhere.

  • Organizational records become buried amongst the shared folders and are overlooked by information management policies that govern things such as the record’s retention policy.

  • Security holes are common because access rights are not propagated to sub-folders leaving sensitive content exposed.

  • In many cases, a shared folder hierarchy represents an department’s or an organization’s content taxonomy.  A document’s presence in a shared folder hierarchy is implicitly “tagged” with metadata being the names of the folders in the hierarchy.  Problem is, when the document is copied or moved, the metadata (i.e. the names of the folders) does not travel with the document.

  • Finding information can be a time consuming and costly endeavour.  Yes, costly; see “The high cost of not finding information”.

It’s obviously a no brainer to move those organizations from shared folders to a document management system.  And many companies are either using (or thinking about using) SharePoint as a document management solution.

You might think that using SharePoint 2007 as a document management system will cure all the shared folder ills that I highlighted above.  Wrong.  SharePoint 2007 has a core performance limitation that will require you to use folders (or indexed views) when the number of documents in a single view starts to exceed 2000.  The same applies to SharePoint lists.

Tests show that the performance of libraries starts tanking as the number of documents/items in a library view approaches and exceeds 2000.  The recommended way to avoid the performance degradation trap is to create folders within libraries to break down the documents into sub-2000 document folders.  The TechNet article “Plan for software boundaries (Office SharePoint Server)” talks in detail about this performance limitation and how to deal with it using folders.  It very clearly states that folders are “critical for scaling“.  So, unfortunately, if you expect to store more than 2000 documents in a library you’re faced with either:

  1. Using folders in a document library and having to deal with many of the same problems as dealing with server-based shared folders.

  2. Using indexed views (views that used indexed columns).  Helps improve performance for libraries with 2000+ documents but performance is not as good as using folders.

  3. Using multiple document libraries that store no more than 2000 documents.

Options 2 (indexed views) and 3 (multiple document libraries) may alleviate the need to use folders.  However they potentially force you to deviate from your information architecture (IA) by using multiple libraries and views that do not conform to IA requirements for the organization and presentation of content.  Those options can also introduce additional administrative overhead managing the many libraries, views, and indexed columns.

Ultimately, if an organization needs to store many thousands of documents in SharePoint 2007 they will likely have to break down the libraries into hierarchies of sub-folders.  And now you’re back to square one dealing with folder hell.  Will the next version of SharePoint address the performance limitation and eliminate the need for folders?

Technorati Logo , , ,

7 Responses to “SharePoint Document Management - Folder Discontent”


  1. 1 Mark Miller

    Steve - It’s not the total number of documents that’s the limitation, it’s the total number of items in a single view. You can have millions of docs in a library and as long as no view exposes no more that 2000 items, you’re good to go.

    Regards,
    Mark Miller, Founder and Editor,
    EndUserSharePoint.com

  2. 2 Steve

    Mark, thanks for pointing out that the performance limitation is on the view, not the library itself (which can accommodate up to 5 million documents). I knew the limitation was on the view but didn’t catch the oversight before posting. I’ve corrected the article. I should also clarify that I’m not trying to be a scaremonger and that the 2000 item limit is not a hard limit. Depending on your hardware and the load on the system, performance can still be acceptable beyond 2000 items.

    The problem of dealing with folders will still remain in many cases where the growth of the library and views makes it necessary to create folders. I agree that using views, especially indexed views, can alleviate the need for folders in cases where the total items in the library can be adequately partitioned by views. And, of course, there are the enterprise search capabilities of MOSS that can make it very easy to find content in situations where the desired document is not easily located using a view. However, adding more and more views as the size of the library or list grows will run out steam and become unmanageable. That leaves you with the options of creating additional libraries and/or using folders. There are other approaches like setting up some kind of archiving for “inactive” content that reaches a certain age which can help keep libraries trimmed and views manageable. Then when older, “inactive”, content is required the Search feature can be used to find it in the archive. Are there other approaches you would recommend Mark?

    Ultimately I’d like to see SharePoint improved to remove the 2000 item “limit” on views and to enable libraries to grow unencumbered by folders. Libraries/lists need to provide the same kind of paging capabilities as you see in many products and sites that present huge volumes of results. Views would still be used of course but you wouldn’t be faced with a drop in performance when views hit the 2000 item limit. I don’t know anything about SharePoint’s core engine so perhaps there are good/significant reasons why addressing this performance limit is not easily achieved?

  3. 3 Jeff

    I am not sure that Mike is correct. The MS guidance in “Planning and architecture for Windows SharePoint Services 3.0 technology, part 2″ seems to indicate they are referring to performance as a function of total library size. They have some nice graphs showing performance (measured in transactions per second) for a flat library that shows the 200, 2000 and 4000 performance “cliffs” clearly. There is also a comparison of performance for folder views (500docs/folder) vs indexed views up to 1M total docs. They clearly indicate that “performance is better when folders are used.”

    I am in agreement w/ the frustration that I sense from Steve as I try to begin a SP project. The entire community is cleared geared to abhor the folder and I see the benefits of the “tagged” doc approach of SP. But the built-in library performance limitations and MS’s own advice drive the need for folders. The disconnect is expecially painful for me as I am new to the game and have a project w/ a known library size of 10k docs but there is a dearth of articles/guidance on how to implement a folder structure to deliver optimized performance of the product. Uugghh!!

  4. 4 Bharath

    Hi Guys,

    I have a Share folder which has to be moved to a document library and the size is 13.3 gb with 19500 files and 980 folders.
    I’m moving this to sharepoint and i am worried about the performance and the querry time required whenever i search for or sort or filter datas.

    Just lemme know the best practice or some links..

    Thanks,
    Bharath

  5. 5 Mark

    Yes, I faced the same issues–in spades! I had one dept with 1500+ documents in 300+ folders, and most definitely, the folder names were the metadata. Some folder names were as simple as “1″, meaning from the taxonomy “Product = XYZ; Major Version = 1″. The sub-folders “1″, “2″, and “3″ would correspond to “Product = XYZ; Major Version = 1″; Minor Version = 1″; “Product = XYZ; Major Version = 1″; Minor Version = 2″; and Product = XYZ; Major Version = 1″; Minor Version = 3″.

    My options were to move as is, or create and attach metadata like “Prod XYZ.1.1″, “Prod XYZ.1.2″, “Prod XYZ.1.3″ to 1500+ documents.

    I WISH there was time to build a tool as follows, but maybe the concept can help: One Workflow runs against a document. It can grab the path (like Products/XYZ/1/3/specs.doc”) and store it into a new field (not needed in the view) with the document. A second field decomposes the path based on the location of each slash, and synthesizes a metadata value (like “Prod XYZ.1.3″). Copy that computation to a third field as a SLOT, and voila, that document is now tagged with the metadata value “Prod XYZ.1.3″. Then, create a view that groups by the metadata and ignores the folders.

    FOR EXTRA CREDIT, change the document name from “specs.doc” to “Prod XYZ.1.3 Specs.doc”.

  6. 6 Steve

    Mark, your suggested migration workflow (from shared folders) into SharePoint with preservation of folder path as metadata is a superb concept. I don’t think it would be that difficult at all to write a component to do that. Of course, the nature of shared folders is that someone may have placed a document in the wrong folder and … it would end up with the wrong metadata in SharePoint but for the vast majority of documents I think your migration idea would work very well. And, another big benefit of your approach is that with SharePoint Enterprise Search, the folder metadata value can be indexed and can subsequently be searched. So, for users accustomed to working in their shared folders for years, they can simply go to the SharePoint search field, and search for documents containing a certain folder name by searching against the folder metadata property. Brilliant!

  7. 7 Brice

    I know I am late to this party, but you could do something like that using a migration tool.

    I am currently looking into one by Metalogix, here: http://www.metalogix.net/Products/FileShare-Migration-Manager-for-Sharepoint/

    They have several different products that interface with various systeme: File System, Documentum/eRoom, custom web-based DMS, etc. They are very helpful and service oriented, as well.

    My task is to index a folder full of files (and subfolders), extract path metadata and merge it with an Excel spreadsheet that contains a ‘Master Document List, with more metadata. Then the whole lot needs to be crammed into Sharepoint. Supposedly this will help do the trick (recommendations from a good friend), but I haven’t had the opportunity to check it out fully, yet.

    I am also looking into storing documents externally with StoragePoint. Seamless integration for end-users. They never know the difference, and network storage/SAN is cheaper than more SQL boxes, processors, licenses, management, etc. This is a pretty good product: http://www.storagepoint.com

    Good luck, all.

Leave a Reply




Bad Behavior has blocked 90 access attempts in the last 7 days.