Graphsync

A protocol to synchronize graphs across peers.

Graphsync uses IPLD Selectors to effeciently transfer graphs (or selections of parts of graphs) with a minimal number of independent requests, and thus seeks to attain low overhead for high latency situations.

[Meta: Status of this doc]

Concepts and Glossary

Interfaces

This is a listing of the data structures and process interfaces involved in the graphsync protocol. For simplicity, we use Go type notation, though of course graphsync is language agnostic.

type Graphsync interface {
	Request(req	Request) (Response, error)
}

type Request struct {
    Selector Selector
    Priority Priority      // optional
    Expires  time.Duration // optional
}

type GraphSyncNet interface {
    SendMessage(m Message)
    RecvMessage(m Message)
}

Network Messages

message GraphsyncMessage {

  message Request {
    int32 id = 1;       // unique id set on the requester side
    bytes root = 2;     // a CID for the root node in the query
    bytes selector = 3; // ipld selector to retrieve
    map<string, bytes> extensions = 4; // side channel information
    int32 priority = 5;	// the priority (normalized). default to 1
    bool  cancel = 6;   // whether this cancels a request
    bool  update = 7;   // whether this is an update to an in progress request
  }

  message Response {
    int32 id = 1;     // the request id
    int32 status = 2; // a status code.
    map<string, bytes> extensions = 3;    // side channel information
  }

  message Block {
  	bytes prefix = 1; // CID prefix (cid version, multicodec and multihash prefix (type + length)
  	bytes data = 2;
  }

  // the actual data included in this message
  bool completeRequestList    = 1; // This request list includes *all* requests, replacing outstanding requests.
  repeated Request  requests  = 2; // The list of requests.
  repeated Response responses = 3; // The list of responses.
  repeated Block    data      = 4; // Blocks related to the responses
}

Extensions

The Graphsync protocol is extensible. A graphsync request and a graphsync response contain an extensions field, which is a map type. Each key of the extensions field specifies the name of the extension, while the value is data (serialized as bytes) relevant to that extension.

Extensions help make Graphsync operate more efficiently, or provide a mechanism for exchanging side channel information for other protocols. An implementation can choose to support one or more extensions, but it does not have to.

A list of well known extensions is found here

Updating requests

A client may send an updated version of a request.

An update contains ONLY extension data, which the responder can use to modify an in progress request. For example, if a responder supports the Do Not Send CIDs extension, it could choose to also accept an update to this list and ignore CIDs encountered later. It is not possible to modify the original root and selector of a request through this mechanism. If this is what is needed, you should cancel the request and send a new one.

The update mechanism in conjunction with the paused response code can also be used to support incremental payment protocols.

Response Status Codes

# info - partial
10   Request Acknowledged. Working on it.
11   Additional Peers. PeerIDs in extra.
12   Not enough vespene gas ($)
13   Other Protocol - info in extra.
14   Partial Response w/ metadata, may include blocks
15   Request Paused, pending update, see extensions for info

# success - terminal
20   Request Completed, full content.
21   Request Completed, partial content.

# error - terminal
30   Request Rejected. NOT working on it.
31   Request failed, busy, try again later (getting dosed. backoff in extra).
32   Request failed, for unknown reason. Extra may have more info.
33   Request failed, for legal reasons.
34   Request failed, content not found.

Example Use Cases

Syncing a Blockchain

Requests we would like to make for this:

Downloading Package Dependencies

Loading content from deep within a giant dataset

Loading a large video optimizing for playback and seek

Looking up an entry in a sharded directory

Given a directory entry I think might exist in a sharded directory, I should be able to specify the speculative hamt path for that item, and get back as much of that path that exists. For example:

"Give me <shardhash>/AB/F5/3E/B7/11/C3/B9"

And if the item I want is actually just at /AB/F5/3E, I should get that back.

Other notes

Cost to the responder. The graphsync protocol will require a non-zero additional overhead of CPU and memory. This cost must be very clearly articulated, and accounted for, otherwise we will end up opening ugly DoS vectors