Quantcast
Channel: SearchHub | Lucene/Solr Open Source Search » Tika
Browsing latest articles
Browse All 9 View Live

Thoughts on Efficiency of Enterprise Search on eWeek.com

eWeek.com recently posted a nice article by Dr. Yves Schabes, founder of Teragram, on how to make enterprise search better through some higher order processing techniques like metadata generation,...

View Article



The Apache Lucene Ecosystem: My view of 2009

It’s that time of year, so I thought I would take a look back at the year that was for the Lucene Ecosystem and maybe look ahead just a little bit too.First and foremost, it should be obvious to even...

View Article

Apache Lucene Connector Framework now in Incubation at the ASF

Short Version The Apache Lucene Connector Framework project has officially entered incubation.  LCF, for short, is going to be a framework for connecting to content repositories like Sharepoint,...

View Article

News Flash: Apache Lucene gives birth to triplets!

Apache Lucene (the Lucene top level project, not Lucene the Java search API.  I know,  it’s confusing sometimes) has once again proved to be a fertile area for innovation (having already given birth to...

View Article

Extending Apache Tika Capabilities

Apache Tika is a toolkit for extracting metadata and textual content from various document formats. Tika itself provides implementation for parsing some document formats while it relies on external...

View Article


The Apache Lucene Ecosystem: My View of 2010

After a week off to enjoy time with my family, I thought I would kick off the last week of 2010 with a look back at the year as it relates to the Apache Lucene ecosystem.  For anyone who follows the...

View Article

Indexing rich files into Solr, quickly and easily

This past weekend I presented yet another “Rapid Prototyping with Solr” presentation, this time back in the saddle with the No Fluff, Just Stuff symposium in Raleigh, NC. I intentionally waited until...

View Article

Indexing with SolrJ

Two popular methods of indexing existing data are the Data Import Handler (DIH) and Tika (Solr Cell)/ExtractingRequestHandler. These can be used to index data from a database or structured documents...

View Article


Image may be NSFW.
Clik here to view.

Scaling Solr Indexing with SolrCloud, Hadoop and Behemoth

We’ve been doing a lot of work at Lucid lately on scaling out Solr, so I thought I would blog about some of the things we’ve been working on  recently and how it might help you handle large indexes...

View Article

Browsing latest articles
Browse All 9 View Live




Latest Images