Solr provides an embedded server that makes writing unit tests easy. But how do you modify your code to use the embedded server for your tests. The trick is to us an anonymous class. You can switch out the regular server with an embedded one using this method. First you need to create an embedded […]Read more "Testing Solr Apps"
Summarizing text is difficult. Some attempts have been made but they are often clunky. One that works ok is called Textteaser and can be found here: https://github.com/MojoJolo/textteaser It uses an algorithm called Density Based Selection to identify important sentences. It’s considered a selective text summarizer because it selects relevant sentences. Abstract summarizers attempt to summarize […]Read more "Text Summarization"
Getting started with entity extraction and Stanford CoreNLP just takes a few steps. Grab the property file and NLP models from the Stanford CoreNLP github repo: all.3class.distsim.prop and all.3class.distsim.crf.ser.gz and then run the below Groovy code:Read more "Entity Extraction with Stanford CoreNLP"
R is a popular language for Machine Learning. Getting started is pretty easy. First, install R on your local machine. Then, try running the the script below. You may need to install the two packages first. The script uses the K Nearest Neighbor classification algorithm to learn what features or attributes may identify cancerous tumors. […]Read more "Machine Learning with R"
To process Avro files with Spark you need to register with Kryo a serializer. Because Spark generally uses Kryo for serialization, you need to instruct Kryo to use Avro for serialization your Avro objects. Below is an example of a registered serializer using Groovy:Read more "Spark and Avro"
Using OpenNLP to extract proper nouns is pretty easy. Here’s some code showing how to do it:Read more "Using OpenNLP to Find People, Places and Organizations"
Querying SolrCloud is pretty easy. Here’s a simple script:Read more "Querying SolrCloud"