Jsoup Annotations: Examples

We introduced Jsoup Annotations in our last post where we listed the POJO property annotations that can be used to retrieve Jsoup parsed content. Today we'll see all of the annotations in action. All examples are found in the NixMash Spring jsoup module test source.

For quick review, Jsoup Annotations are applied to simple POJO class properties with the class sent as a Generic Type to the JsoupHtmlParser which populates the properties. The list of available annotations is:

  • @Selector — selects by tag, class or id. Used with @TextValue, @HtmlValue and @AttributeValue
  • @MetaName — select a Meta Tag by name attribute
  • @MetaProperty — select a Meta Tag by property attribute
  • @LinkSelector — retrieves href and text values of a Jsoup Link Element. Stores in a JsoupLink object
  • @ImageSelector — retrieves src, alt, height and width of Jsoup Src Element. Stores in JsoupImage object

Okay, let's look at examples!

@Selector

We'll begin with the @Selector annotation. Here's our HTML.

@Selector("#myid")
@TextValue
public String myIdText;

Retrieves “This is my id text”. Notice we're retrieving an Id element so we include the “#” prefix.

_____________

@Selector(".myclass")
public String myClassText;

Retrieves “This is my class text”. Notice the “.” preceding the CSS class name. Also, the absence of any annotation value defaults to @TextValue.

_____________

@Selector(".myclass")
@HtmlValue
public String myClassHtml;

Retrieves “<span>This is my class text</span>”.

_____________

@Selector(".myclass")
@AttributeValue(name="myattr")
public String myClassAttribute;

Retrieves “grouchy”, the myattr attribute value of the .myclass selected element.

@MetaName and @MetaProperty

@MetaName and @MetaProperty select Meta Tag content values from their name or property attributes. Here's our HTML.

@MetaName("twitter:image:src")
public String twitterImage;

Retrieves “http://twitter.image”.

_____________

@MetaProperty("og:image")
public String facebookImage;

Retrieves “http://facebook.image”.

@ImageSelector

Our HTML.

@ImageSelector( value = "#content")
public List<JsoupImage> testImagesInContentArea;

Retrieves the two images images in the <div#content /> region and places them in an Object called JsoupImage  which has src, alt, height and width properties. The height and width values of the second JsoupImage object would be null.

_____________

@ImageSelector(".myimage")
public JsoupImage testImage;

Retrieves the second image in the <div#content /> region because it contains “class=myimage”.

_____________

@ImageSelector
public List<JsoupImage> testImagesInPage;

Retrieves all images on the page, in our example a total of 3.

@LinkSelector

The @LinkSelector functions exactly as @ImageSelector, except it places images in a JsoupLink object. The JsoupLink object contains two properties: href and text.

@LinkSelector("#content")
public List<JsoupLink> testLinksInContentArea;

Retrieves the two href links in the <div#content /> area in the above HTML example.

Source Code Notes for this Post

All source code discussed in this post can be found in my NixMash Spring GitHub repo and viewed online here.

Posted May 26, 2016 12:27 PM EDT

More Like This Post