How IPFS Web Gateways Work

This document talks through how IPFS web gateways work; the various forms of abstraction used in accomplishing parts of that task; and how and why these designs are cool (and how to can reuse parts of them).

:tea: Note! This is an aspirational document -- IPFS web gateways are a feature which has substantial history. They haven't always worked like this.

What are the goals?

What are IPFS Web Gateways, anyway?

The "web gateway" is a feature of IPFS that serves the contents of files over HTTP. It selects what content to serve by taking the path from the URL requested, and splitting it into two parts: the CID and the remainder of the path.

The CID is used to pick some piece of data to start at. The remainder of the path will be used as instructions to traverse IPLD data, starting from that data which the CID identified. At the end of that traversal, we'll try to understand whatever is pointed to as a bunch of bytes (a "file" isn't much more than that)... and pour it out into an HTTP response.

This is useful because it means we can store data in IPFS, and that data can become easily accessible as normal webpages.

How do we do it?

There's actually a ton of moving parts to this. Let's break it down:

files are just bytes

(... in this context, anyway.)

TODO expound

large bytes are provided by ADLs (FBL, specifically)

TODO expound

directories are provided by ADLs (HAMT, specifically)

TODO expound

... which is neat, because most of it is also going through ADLs, as we just described in the previous sections.

Because defining an ADL requires defining how the ADL will make its content data legible to regular IPLD Data Model semantics, we can use standard library functions for pathing over this data. No special path handling logic is written for the Web Gateways (or other users of unixfsv2) at all!

knowing where to use ADLs, and which ones, is defined by IPLD Schemas

... and more specifically, the Unixfsv2 schema, which is pretty common and used by much more than the IPFS web gateway system.

The web gateways know about the schema for unixfsv2, and they assume that it applies to whatever data is being requested through the gateway system.

If the data doesn't match, it won't be renderable through the web gateway (and we return errors about how and where the data ceased to match the schema that's expected for unixfsv2 data).

If the data does match: then the schema also tells us where to expect ADLs to come into effect, and also tells us what Data Model semantics to expect from the data that is made available once the ADL logic is applied.

if you don't have this data: we fetch it

This part hits the limits of what is directly in the scope of IPLD, but in brief, what happens in IPFS is:

What if we want to peek behind the curtain?

All of the structures above are serialized in CBOR. All the magic comes from IPLD semantic features like ADLs and Schemas.

What's neat about this is it's also very easy to drop all the magic and look directly at the raw data.

Want to see the raw CBOR objects that make up all these structures? No problem. They're hidden just barely under the hood, and we try to make it easy to lift that hood up and take a peek.

In fact, you can inspect all the raw CBOR objects making up these structures simply by using IPLD libraries to start at the same CID, and then use regular Data Model traversal mechanisms to stroll around. If you provide the Unixfsv2 schema (and thereby its hints about where to apply ADLs), you'll get the high level view of "directories" (maps) and "files" (bytes); if you don't, you'll get to stroll around the raw CBOR objects instead.

What's reusable?

Gosh. Dang near everything.

The whole IPFS web gateway system was built in terms of standard IPLD components, assembled: schemas, ADLs, and plain basic pathing and traversal over the Data Model were all we needed to accomplish all this.

What other limits does this have?

It's important to note that the web gateways do almost all of this work using IPLD libraries and not much else... but, ADLs also have some special caveats.

Because ADLs need some logic to make their translations of data work, and because IPLD doesn't itself include an interpreter or scripting language runtime (and even if it did, we'd be worried about capping its resource budget)... IPLD libraries typically require that a list of allowed ADLs be defined explicitly.

The IPFS web gateways specifically allow the FBL and HAMT ADLs.

If you wanted to produce data that used other sharding algorithms, but was otherwise structurally matching the Unixfsv2 schema, and wanted those to be correctly displayed by some IPFS web gateway? You'd need to reconfigure that gateway to expand the list of allowed ADLs to include whatever sharding system it is your wanted to use.

Historically: how did this previously work?

:closed_book: This section is historical notes -- it's here for readers who were familiar with the old system, and want to be refreshed on how it compares to the new designs documented above, or for readers who want to understand some of the learning experiences gathered from previous bouts at this work. However, you shouldn't really need to read it unless one of those two sources of curiosity appeals to you.

The earliest implementations of IPFS used a codec called "dag-pb" to represent files and directories. ("dag-pb" is still around as an IPLD codec -- but nowadays, we consider it one of the more limited codecs, and a very esoteric one at that.)

The "dag-pb" codec included many responsibilities that we now handle separately: it included the concept of directories (and sharding them) in the codec itself (whereas we now split this out into map ADLs); similarly, it include the concept of files (and sharding their component bytes) in the codec itself (whereas we now split this out into bytes ADLs).

By baking these things into the codec itself, there were several major disadvantages compared to the present IPLD system design:

Looking at the other direction, "dag-pb", even with all its special features, was more limited than the modern IPLD system designs: