Laying the Solr Groundwork for NixMashup Links

I’m building an app which takes my NixMashup Link Posts, strips out the individual links and creates a new post from each. The links will be available on a new website with functionality that will include tagging, addressability, searching, and other fun stuff. More on the project and the features as they happen. Today we’ll look at laying the foundation for searching the individual links in Apache Solr.

I’ve blogged about Solr before, specifically how to create a friendly local url for development. That gives us a convenient http://solr url to use, and since our Solr collection name is NixMashup our full url will be http://solr/nixmashup.

Sample Links

We’ll start by creating a few sample links to prime the Solr Index and get comfortable with our new Collection. Our nixmash.xml file contains only four records, but enough to get things underway. The XML was created from the original HTML, though we could have easily used JDOM or Jsoup to create it programmatically.

The Collection Schema.xml

Below is the Solr Collection Schema.xml file which reflects our document format. Individual fields and their index properties are configured (1), the unique field of link_id is defined (2) and our catch-all “text” field is created from the link title, text body and tags fields (3).

Entering the Nixmash.xml Records

There are 4 records (documents) in the nixmash.xml file. Here are two options for entering them, using Solr’s post.jar or curl.

Keeping It Clean

Here’s a bonus bash script for removing all documents from a collection if we ever want to start over.

Playing in Solr

Our goal of searching and playing with our NixMashup links in the Solr Control Panel has been achieved. Here we search tags for “ubuntu.”