Interface SnapshotConsumer
-
public interface SnapshotConsumer
A
SnapshotConsumer
allows downloading and consuming the entities of a snapshot, given the URL of a snapshot index.Consuming a snapshot is a two-step process:
- download the snapshot index using
loadSnapshotIndex(Url)
- pass the returned
SnapshotIndex
to eitherstreamPages(SnapshotIndex)
orstreamEntities(SnapshotIndex)
to download and stream the actual pages
stream*
methods to use depends on your requirements:streamEntities(SnapshotIndex)
hides the page breaks and provides a flat stream of fully downloaded entities. This is the recommended method because the returned entities can be directly parsed and e.g. stored in a database, which is usually the level of abstraction consumers are working at.streamPages(SnapshotIndex)
returns a stream ofStreamingPage
objects instead of entities. These allow access to page headers and allow incremental processing of entity bodies (compared to always downloading an entity in full before processing it). This interface provides more fine-grained control over how pages are downloaded, is more difficult to use than the flat stream of entities.
A snapshot consumer is created using a builder: call the
builder()
method to create a new builder with default settings, call the methods onSnapshotConsumer.Builder
to customize the consumer, then callSnapshotConsumer.Builder.build()
to create a newSnapshotConsumer
instance. - download the snapshot index using
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static class
SnapshotConsumer.Builder
A builder forSnapshotConsumer
.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Modifier and Type Method Description static @NonNull SnapshotConsumer.Builder
builder()
Create a newSnapshotConsumer.Builder
with default settings.@NonNull java.util.concurrent.CompletionStage<@NonNull SnapshotIndex>
loadSnapshotIndex(@NonNull Url url)
Download and parse the snapshot index at the given URL.java.util.concurrent.Flow.Publisher<@NonNull Entity<@NonNull SnapshotEntityHeader>>
streamEntities(@NonNull SnapshotIndex snapshotIndex)
Return a stream of entities in the givenSnapshotIndex
.java.util.concurrent.Flow.Publisher<@NonNull StreamingPage<@NonNull SnapshotPageHeader,@NonNull SnapshotEntityHeader>>
streamPages(@NonNull SnapshotIndex snapshotIndex)
Return a stream of pages in the givenSnapshotIndex
.
-
-
-
Method Detail
-
loadSnapshotIndex
@NonNull @NonNull java.util.concurrent.CompletionStage<@NonNull SnapshotIndex> loadSnapshotIndex(@NonNull @NonNull Url url)
Download and parse the snapshot index at the given URL.- Parameters:
url
- the URL to download- Returns:
- a
SnapshotIndex
- Throws:
HttpException
- in case of HTTP errors (invalid URL, HTTP error status codes, network errors/timeouts, ...)SnapshotIndex.ParsingException
- when the URL's content could not be parsed as a snapshot index (invalid JSON or missing fields)- See Also:
SnapshotIndex.fromJson(Body)
-
streamPages
@NonNull java.util.concurrent.Flow.Publisher<@NonNull StreamingPage<@NonNull SnapshotPageHeader,@NonNull SnapshotEntityHeader>> streamPages(@NonNull @NonNull SnapshotIndex snapshotIndex)
Return a stream of pages in the given
SnapshotIndex
. The page bodies are not downloaded immediately. Only the HTTP headers are downloaded initially. The returnedStreamingPage
objects allow incrementally streaming the page's entities. They also provide methods to consume the page as a stream of entities and to download the entire page into memory at once.The number of pages that are requested concurrently can be set with the
SnapshotConsumer.Builder.networkConcurrency(int)
setting on the builder. If concurrency is 1, pages will be returned in the order they are listed in the snapshot index. If concurrency is >1, pages may be returned out-of-order.By default, in an error occurs while requesting a page, the stream is terminated with that error. If the
SnapshotConsumer.Builder.delayErrors(boolean)
setting is set to true, errors will be collected and returned in a single batch once the stream is exhausted.- Parameters:
snapshotIndex
- the snapshot index to stream pages from- Returns:
- a stream of
StreamingPage
for every page URL listed in the snapshot index - Throws:
HttpException
- in case of HTTP errors (invalid URL, HTTP error status codes, network errors/timeouts, ...)PageFormatException
- if theContent-Type
HTTP header is missing or invalid. Invalid page bodies (e.g. invalid multipart documents) are not reported by this method directly because it doesn't parse page bodies. Errors in page bodies are only reported when theStreamingPage
s themselves are consumed.ConsumerException.CollectedErrors
- ifSnapshotConsumer.Builder.delayErrors(boolean)
is true and more than one error occurred- See Also:
StreamingPage.toCompleteEntities()
,StreamingPage.toCompletePage()
-
streamEntities
@NonNull java.util.concurrent.Flow.Publisher<@NonNull Entity<@NonNull SnapshotEntityHeader>> streamEntities(@NonNull @NonNull SnapshotIndex snapshotIndex)
Return a stream of entities in the given
SnapshotIndex
. This builds onstreamPages(SnapshotIndex)
turning each page into a stream of entities. Individual entity bodies are downloaded completely before being returned.The number of pages that are downloaded concurrently can be set with the
SnapshotConsumer.Builder.networkConcurrency(int)
setting on the builder. If concurrency is 1, entities will be returned in the order they appear in their page, and the blocks of entities for a page will be returned in the order the pages are listed in the snapshot index. If concurrency is >1, entities may be returned out-of-order.By default, in an error occurs while downloading a page, the stream is terminated with that error. If the
SnapshotConsumer.Builder.delayErrors(boolean)
setting is set to true, errors will be collected and returned in a single batch once the stream is exhausted.- Parameters:
snapshotIndex
- the snapshot index to stream pages from- Returns:
- a stream of
Entity
containing all entities from all pages listed in the snapshot index - Throws:
HttpException
- in case of HTTP errors (invalid URL, HTTP error status codes, network errors/timeouts, ...)PageFormatException
- if theContent-Type
HTTP header is missing or invalid, or if a page body is unparseable or otherwise invalidConsumerException.CollectedErrors
- ifSnapshotConsumer.Builder.delayErrors(boolean)
is true and more than one error occurred- See Also:
streamPages(SnapshotIndex)
,StreamingPage.toCompleteEntities()
-
builder
@NonNull static @NonNull SnapshotConsumer.Builder builder()
Create a newSnapshotConsumer.Builder
with default settings. Use theSnapshotConsumer.Builder.build()
method on the returned builder to create aSnapshotConsumer
with the specified settings.- Returns:
- a new builder
-
-