In a production system, it is a bad idea to do complex indexing on the server, as it is too busy actually serving queries. In addition, while Solr has a very large number of extensibility points, sometimes it is easier to do the heavy pre-processing outside of Solr, in a long running batch or even completely offline. Then, the final results can be sent to Solr in the format it expects.
And in some cases, it may make sense to create a standalone collection, populate it in the most efficient way, and only then have it added to the production server by copying over the
data directory or swapping out a collection.
Solr supports both of these scenarios with the Java client library SolrJ. It can be run as a pure ...