RSS,
or Really Simple Syndication, is a specification for XML files to
provide syndicated data. It is typically used by news sites and blogs
to provide information concerning the latest news stories, posts, etc.,
in such a way that links to the stories can be included on other web
sites or even downloaded by news aggregator programs. Many thousands of
RSS feeds are currently available -- take a look at a site such as syndic8 to get an idea.
Informa is a relatively new open source Java API for parsing RSS files available from http://informa.sourceforge.net/.
The Informa project was the result of merging two Java-based aggregator
services: HotSheet and Risotto. This article aims to show how you can
use the Informa API to quickly access RSS feeds to add some dynamic
news and information content to your web sites.
RSS: An Overview
To begin with, we'll take a quick look at an RSS example. It's a
very simple format (hence the name Really Simple Syndication), but for
those who would like a more in-depth introduction to RSS, you could do
far worse than checking out O'Reilly's RSS site, or by reading Mark Pilgrim's very good overview. Here's the example:
<?xml version="1.0"? >
<!-- The version of RSS we are using -->
<rss version="0.91">
<!-- Information about our channel -->
<channel>
<title>Random News</title>
<link>http://www.randomnews.com/</link>
<description>
Random news from the random news website!
</description>
<language>en-us</language>
<copyright>Copyright: (C) 2003 Random News.com</copyright>
<image>
<title>Random News Logo</title>
<url>http://randomnews.org/images/logo88x33.gif</url>
<link> http://randomnews.org/</link>
</image>
<item>
<title>News piece one</title>
<link>http://randomnews.org/getnews.pl?article=1</link>
</item>
<item>
<title>News piece two</title>
<link>http://randomnews.org/getnews.pl?article=2</link>
</item>
</channel>
This is using version 0.91 of the RSS specification. You need a
channel that describes the source for the information we are getting.
There will be one channel per XML document. Without going into too much
detail of this format, this is how you describe a channel:
<channel>
<title>Random News</title>
<link>http://www.randomnews.com/</link>
<description>
Random news from the random news website!
</description>
<language>en-us</language>
<copyright>Copyright: (C) 2003 Random News.com</copyright>
</channel>
The following defines an image provided by the site.
<image>
<title>Random News Logo</title>
<url>http://randomnews.org/images/logo88x33.gif</url>
<link> http://randomnews.org/</link>
</image>
This is the real meat of the file. The <item>
block gives us the title of a piece of information, a link to the
original post, and optionally, a description of the post. This is by no
means all of the data that an RSS file may provide, but this enough for
our purposes.
<item>
<title>News piece one</title>
<link>http://randomnews.org/getnews.pl?article=1</link>
<description>Its an article</description>
</item>
There are thousands of news sites and blogs out there with feeds
available in this format. Just think -- instead of doing the normal
morning check on Slashdot, Freshmeat, or wherever, what if their
content was delivered straight to your own personal portal, or RSS
aggregate service? Implementing such a solution is very simple -- in
the rest of the article we'll look at how we can process this data and
display it on our JSP pages.
Reading RSS: The Informa API
Currently at version 0.3.0, Informa works perfectly well at reading
RSS versions 0.91, 0.92, 1.0, and 2.0. Let's have a quick look at its
usage:
try {
URL feed = new URL("file:/C:/samplefeed.rss");
ChannelFormat format = FormatDetector.getFormat(feed);
ChannelParserCollection parsers =
ChannelParserCollection.getInstance();
ChannelParserIF parser =
parsers.getParser(format, feed);
parser.setBuilder(new ChannelBuilder());
ChannelIF channel = parser.parse();
for (Iterator iter = channel.getItems().iterator();
iter.hasNext();) {
ItemIF item = (ItemIF)iter.next();
System.out.println(item.getTitle());
}
} catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (UnsupportedFormatException ufe) {
ufe.printStackTrace();
} catch (ParseException pe) {
pe.printStackTrace();
}
This simple example gets the RSS feed and prints out the news items.
This small piece of code will form the basis for much of what follows,
so it's worth going over in detail. Begin by creating a URL
object that will point to the feed to be loaded. We then use the handy FormatDetector
method to determine which version of RSS the feed uses and gets us the relevant parser.
URL feed = new URL("file:/C:/samplefeed.rss");
ChannelFormat format = FormatDetector.getFormat(feed);
ChannelParserCollection parsers =
ChannelParserCollection.getInstance();
Next, we get the correct parser for our feed type (there is one per
supported version of the RSS specification) and create a default
builder object for the parser. In Informa, a builder object is
responsible for the creation and storage of a feed. Currently in
development is a Hibernate Builder, which will allow database
persistence of a feed. Here, the default ChannelBuilder
is used, which simply creates an in-memory feed.
ChannelParserIF parser = parsers.getParser(format, feed);
parser.setBuilder(new ChannelBuilder());
Finally, we parse the document to create a bean representing an RSS
channel. Now, we could embed this code directly into our JSP code as a
scriptlet, but this is not best practice. Instead, we are going to
produce a reusable custom tag that will allow us to display any named
feed.
ChannelIF channel = parser.parse();
RSS Custom Tags
Let's start by looking at how our tag will look in our JSP page when requesting a feed from the BBC:
<%@ taglib prefix="rss" uri="/WEB-INF/rsstaglib.tld" %>
<rss:simpleRssFeed uri="http://www.bbc.co.uk/syndication/feeds/news/ukfs_news/world/rss091.xml" />
Pretty simple -- we have a tag with one required method that names the feed. Now let's look at the code:
public class SimpleRssFeedTag extends TagSupport {
private String uri;
public String getUri() {
return uri;
}
public void setUri(String uri) {
this.uri = uri;
}
public int doEndTag() throws JspException {
JspWriter out = pageContext.getOut();
try {
URL feed = new URL(getUri());
ChannelParserCollection parsers =
ChannelParserCollection.getInstance();
ChannelFormat format =
FormatDetector.getFormat(feed);
ChannelParserIF parser =
parsers.getParser(format, feed);
parser.setBuilder(new ChannelBuilder());
ChannelIF channel = parser.parse();
out.print("<b>" + channel.getTitle() + "<b><br />");
for (Iterator iter = channel.getItems().iterator();
iter.hasNext();) {
ItemIF item = (ItemIF) iter.next();
out.print("<a href=\"" + item.getLink() + "\">");
out.println(item.getTitle() + "</a><br />");
}
} catch (MalformedURLException mue) {
throw new JspException(mue);
} catch (UnsupportedFormatException ufe) {
throw new JspException(ufe);
} catch (ParseException pe) {
throw new JspException(pe);
} catch (IOException e) {
throw new JspException(e);
}
return EVAL_PAGE;
}
}
This time, rather than printing the titles and links to the command
prompt, we are formatting our links and titles as HTML. Let's look at
an example page where we are requesting a couple of feeds, say,
OnJava.com and Java.sun.com's technology highlights. Use this tag in a
JSP page as follows:
<%@ taglib prefix="rss" uri="/WEB-INF/rsstaglib.tld" %>
<table>
<tr>
<td>
<rss:simpleRssFeed uri="http://www.bbc.co.uk/syndication/feeds/news/ukfs_news/world/rss091.xml" />
</td>
<td>
<rss:simpleRssFeed uri="http://servlet.java.sun.com/syndication/rss_java_highlights-PARTNER-20.xml" />
</td>
</tr>
</table>
And the result:
Figure 1. Our first RSS tag in action
We have managed to display the news items, and clicking on the links
will take you to the articles. Whenever the syndicates update their RSS
files, your page will change too! As an exercise, consider limiting the
number of posts or adding the display of a syndicate's logo to this
basic tag.
A More Refined Tag
Currently, we are doing too much of the actual formatting of the
display in the tag itself. This is inconvenient, as it means that in
order to change the formatting of the tag's results, we need to change
the code. It would be much better if we could leave the mechanics of
reading the feeds up to the tag, and have all of the formatting in the
JSP. In order to achieve this, we need to allow the web designer to
decide what parts of an RSS channel are required, and embed them in
standard HTML.
The JSP Standard Tag Library (JSTL) introduced a simple Expression
Language (EL), which allows us to quickly and easily access JavaBean
properties at runtime. We are using the JSTL EL for accessing beans and
displaying properties, which is fairly straightforward. For example, to
print out the name property of a bean, we would do the following:
<c:out value="${bean.name}">
Here, the bean is a JavaBean available in the page. The <c:out>
tag is used to retrieve the returned value of the expression ${bean.name}
and print it to the output stream. Our new custom tag is going to
expose the RSS feed as a series of beans, and then use the JSTL EL to
access and display its data. Let's look an example use of our new tag:
<rss:readFeed uri="http://today.java.net/pub/q/weblogs_rss?x-ver=1.0" var="channel">
<strong><c:out value="${channel.title}"/></strong>
<ol>
<c:forEach var="item" items="${channel.items}">
<li>
<a href="<c:out value="${item.link}"/>">
<c:out value="${item.title}"/></a>
</li>
</c:forEach>
</ol>
</rss:readFeed>
The first tag <rss:readFeed>
iterates over the feeds channels and loads the into the page scope as the bean name channel. The use of the ${channel.title}
code gets the title property and displays it. Next, we use a standard <c:foreach>
tag to iterate over the items property in channel bean, using the JSTL
EL to display each item's title and link. As you can see, all of the
formatting is done by the JSP code itself -- here we create a series of
HTML lists for each channel in a feed, but this could as easily be a
series of <div>
s, table rows, or whatever.
Surprisingly, this code isn't much more complicated than the original example. Let's take a look:
public class RefinedRssFeedTag extends TagSupport {
private static final ChannelBuilder DEFAULT_BUILDER = new ChannelBuilder();
private static final ChannelParserCollection PARSERS =
ChannelParserCollection.getInstance();
private String uri;
private String var;
private ChannelIF channel;
public String getVar() {
return var;
}
public void setVar(String var) {
this.var = var;
}
public String getUri() {
return uri;
}
public void setUri(String uri) {
this.uri = uri;
}
public int doStartTag() throws JspException {
JspWriter out = pageContext.getOut();
try {
URL feed = new URL(getUri());
ChannelFormat format =
FormatDetector.getFormat(feed);
ChannelParserIF parser =
PARSERS.getParser(format, feed);
parser.setBuilder(DEFAULT_BUILDER);
channel = parser.parse();
//store the channel in the page...
pageContext.setAttribute(getVar(), channel);
} catch (MalformedURLException mue) {
throw new JspException(mue);
} catch (UnsupportedFormatException ufe) {
throw new JspException(ufe);
} catch (ParseException pe) {
throw new JspException(pe);
}
return EVAL_BODY_INCLUDE;
}
}
The main work is done in the doStartTag
method. We parse the RSS file specified in the uri
attribute, and then we store it in the pageContext
under the name specified by the var
attribute (this is standard practice throughout the JSTL). This allows ${channel}
to be used in the tag body. And that's pretty much it! Now let's use it
to view a couple of feeds -- two of a computer programmer's best
friends, Slashdot and Freshmeat.
<rss:readFeed uri=http://slashdot.org/slashdot.rss var="channel">
<IMG src="<c:out value="${channel.image.location}"/>">
<a href="<c:out value="${channel.image.location}"/>">
<strong><c:out value="${channel.title}"/></strong></a>
<ol>
<c:forEach var="item" items="${channel.items}">
<li><a href="<c:out value="${item.link}"/>">
<c:out value="${item.title}"/></a></li>
</c:forEach>
</ol>
</rss:readFeed>
<rss:readFeed
uri="http://freshmeat.net/backend/fm-releases-software.rdf" var="channel">
<strong><c:out value="${channel.title}"/></strong><br />
<a href="${channel.location}">[Feed]</a><br />
<c:forEach var="item" items="${channel.items}">
<a href="<c:out value="${item.link}"/>">
<c:out value="${item.title}"/></a><br />
</c:forEach>
</rss:readFeed>
When reading the Slashdot feed, we format the title as a link right
back to Slashdot itself, with the rest of the items formatted as a
standard HTML ordered list. With Freshmeat, we know they don't provide
an image, so we ignore that, but we also provide a URL link back to the
source of the RSS itself, with items' simple links separated by <br />
tags. The result can be seen below.
Figure 2. Example use of a more complex RSS Tag
Conclusion
I have shown with this article how you can quickly and simply create
an RSS tag that should enable you to quickly insert RSS feeds while
keeping with your site's current design. In no way should this be
considered the end of the world -- the RefinedRssFeedTag
as presented here is far from perfect. Most importantly, no caching of
the requested feeds is done, resulting in feeds being loaded and parsed
every time the tag is run. Over the course of the next several
articles, we will look at approaches that improve upon the solutions
provided here, and will also look at other ways in which we can use RSS
to enrich our software.
Sam Newman is a Java programmer. Check out his blog at magpiebrain.com.