SharePoint Foundation as a Search API Layer for hobbyist apps

I’ve been a SharePoint developer on a variety of customer projects. At least 1/4 of my work has been focused on Enterprise Search using SharePoint and previously FAST Search. Most recently, I worked on a common web content management platform using cross-site publishing and extensive UI work involving the SharePoint Search REST API.

I make mobile apps professionally (as a Microsoft consultant, I’ve worked on Windows and mobile web apps) and as a hobby. My primary hobby app, which I maintain as a service for the military enthusiast, survivalist and Army communities, is Army Field Manuals. It provides a library of a bunch of PDF files published by the U.S. Army.

I’ve also recently completed my MIM degree, in which had some close encounters with librarian- and archivist-type people. These are dedicated folks who organize and classify information, and many of them are extremely tech-savvy. This grad school work inspired me to want to do a better job than the Army has done to expose the public contents of Army doctrine. The primary ways I’ve come up with to do this are to integrate a variety of sources, provide some manual classification, and provide a robust search interface.

How do I do full-text search in a mobile (Windows) application?

Search in my app currently only searches the titles (and soon I’ll be adding descriptions). Unlocking the full text of the documents for search requires (at least in Microsoft lingo) an IFilter for PDFs. The technology is built-in with Windows 8.1, which provides hooks through IndexableContent,, but since the 3GB of field manuals are only downloaded to the app clients on demand, there’s no way to provide full text searching across the whole corpus.

Enter SharePoint Search. SharePoint can search PDFs natively, can provide full text results with hit highlighting, and can return the metadata associated with the document in context. It can also provide rank tuning (in my case, I might want to do things like rank highly rated documents (LibraryThing has some ratings)). So, blending all these ideas I had an epiphany that SharePoint could provide some good value for my app community. So I’ve been figuring out how this might work and be affordable.

What about SharePoint Online for Office 365?

I’ve tried this, but the anonymous site you get seems to preclude using the search API from an anonymous user. When I run this query logged in, it works. As anonymous I get a file not found error. In the SharePoint Foundation solution below it works fine, so I’m assuming this is a specific exclusion of functionality in the public site on Office 365.

/_api/search/query?querytext=%27army%27&QueryTemplatePropertiesUrl=%27spfile://webroot/queryparametertemplate.xml%27&SummaryLength=360&KeywordInclusion=%271%27

What’s the cheapest way to use SharePoint’s API layer?

Shared hosting of SharePoint Foundation is cheapest. Even though no CALs are required for External Users in the other versions, you’d need a CAL for yourself at least—and where would you purchase that as a hobbyist developer? And after that, the 3rd party hosting providers charge a lot more money for standard or enterprise licensed SKUs. I’m still discovering the features that SharePoint Foundation can do, since I typically work with the Enterprise SKU of SharePoint. However, I’ve signed up for a 3GB plan (enough to hold all the public Army doctrine PDF documents) on PlexHosted.com (affiliate link) for about $25/mo, and I’ve found that I can add site columns, access the search REST API anonymously, and access the lists API as well. So far so good. The Term Store is not available; I was hoping to use that to help tag content. However, a multiselect lookup will work for what I’m doing in the short term.

What are the limitations?

There’s more to APIs than anonymously exposing metadata and search results. This solution doesn’t do authentication or much of anything in the way of security or protections for the system against abuse. There is some logging, however, so hopefully that could prove useful. Also the system will be on a shared instance to keep the costs manageable. Not sure if that would scale. Another down side that I’ve found is that my search API access is lumped in with some other sites’ anonymous results. That doesn’t have any real impact for me, but it’s definitely not secured across site boundaries.