Consumer Algorithm

crawl back via timestamp
walk forward:
- skip everything older than timestamp
- with content ID: first entity after content ID -> exactly once semantics
- without content ID: first entity with given timestamp -> at least once semantics
forward-crawling via page timestamps as an optimization is allowed
- but probably not that useful?
consumers SHOULD report errors:
- missing content ID
- timestamps not old enough
musings on different entry points?

Resume consumption

After fully consuming the feed, it’s important for the consumer to note the Last-Modified date of the last entity. This date then becomes the new entry point for future consumption. Additionally, to avoid processing the same entity multiple times, consumers are advised to record the Content-ID of the last entity they processed.

Datareplication Format Documentation

Consumer Algorithm

Resume consumption