Get full access to Hands-On Software Engineering with Golang and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Defining the payload for the crawler

First things first, we need to define the payload that will be shared between the processors for each stage of the pipeline:

type crawlerPayload struct {
    LinkID      uuid.UUID
    URL         string
    RetrievedAt time.Time

    RawContent bytes.Buffer

    // NoFollowLinks are still added to the graph but no outgoing edges
    // will be created from this link to them.
    NoFollowLinks []string

    Links       []string
    Title       string
    TextContent string
}

The first three fields, LinkID, URL, and RetrievedAt, will be populated by the input source. The remaining fields will be populated by the various crawler stages:

RawContent is populated by the link fetcher
NoFollowLinks and Links are populated by the link extractor
Title and TextContent ...

Get Hands-On Software Engineering with Golang now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now