Interface SnapshotConsumer


  • public interface SnapshotConsumer

    A SnapshotConsumer allows downloading and consuming the entities of a snapshot, given the URL of a snapshot index.

    Consuming a snapshot is a two-step process:

    1. download the snapshot index using loadSnapshotIndex(Url)
    2. pass the returned SnapshotIndex to either streamPages(SnapshotIndex) or streamEntities(SnapshotIndex) to download and stream the actual pages
    Which of the two stream* methods to use depends on your requirements:
    • streamEntities(SnapshotIndex) hides the page breaks and provides a flat stream of fully downloaded entities. This is the recommended method because the returned entities can be directly parsed and e.g. stored in a database, which is usually the level of abstraction consumers are working at.
    • streamPages(SnapshotIndex) returns a stream of StreamingPage objects instead of entities. These allow access to page headers and allow incremental processing of entity bodies (compared to always downloading an entity in full before processing it). This interface provides more fine-grained control over how pages are downloaded, is more difficult to use than the flat stream of entities.

    A snapshot consumer is created using a builder: call the builder() method to create a new builder with default settings, call the methods on SnapshotConsumer.Builder to customize the consumer, then call SnapshotConsumer.Builder.build() to create a new SnapshotConsumer instance.