Testing Solr Apps

Solr provides an embedded server that makes writing unit tests easy.  But how do you modify your code to use the embedded server for your tests. The trick is to us an anonymous class. You can switch out the regular server with an embedded one using this method. First you need to create an embedded […]

Read more "Testing Solr Apps"

Text Summarization

Summarizing text is difficult. Some attempts have been made but they are often clunky. One that works ok is called Textteaser and can be found here: https://github.com/MojoJolo/textteaser It uses an algorithm called Density Based Selection to identify important sentences. It’s considered a selective text summarizer because it selects relevant sentences. Abstract summarizers attempt to summarize […]

Read more "Text Summarization"

Machine Learning with R

R is a popular language for Machine Learning.  Getting started is pretty easy.  First, install R on your local machine.  Then, try running the the script below.  You may need to install the two packages first.  The script uses the K Nearest Neighbor classification algorithm to learn what features or attributes may identify cancerous tumors. […]

Read more "Machine Learning with R"

Spark and Avro

To process Avro files with Spark you need to register with Kryo a serializer. Because Spark generally uses Kryo for serialization, you need to instruct Kryo to use Avro for serialization your Avro objects. Below is an example of a registered serializer using Groovy:

Read more "Spark and Avro"