Intro to Advanced Data Layouts

Advanced Data Layouts ("ADLs") are how IPLD supports handling large data -- such as creating maps with millions of entries -- are often used for creating "indexes"; and can be customized to other usecases as well. In general, ADLs are a way to customize how to see and interact with some data. They can be thought of like a "lens" for data: they can take some data and make it legible in a different way. ADLs usually appear as some kind of a plugin mechanism in IPLD libraries.

Technical Definition

To give a slightly more technical definition, an ADL is:

Or when writing:

We say that an ADL has a "synthesized" view -- which is the single Node, seen with the ADL -- and a "substrate" -- which is the data as it is serialized.

Common Usecases

One of the most common uses of ADLs is sharded datastructures. These allow creating very large maps or lists.

Large bytes values can be created via ADLs, while stored sharded. This can be useful for representing "files" in IPLD (while still having the data chunked up into blocks, which makes it easier to transfer the data in pieces).

Other uses are also possible! For example:

Plugability

ADLs usually resemble a "plugin" system. There are many different ADLs. Anyone can develop their own new ADL.

To maximize interoperability, and save development time for common needs, we attempt to standardize the serial form for some ADLs that are commonly used. It's worth looking for existing ADLs that do what you need before rolling your own! (For example: If you're looking for a sharded, scalable map -- you're not the first! That's just one example of code you'll find you can share with others.)

Optional

Note that data is always still data you can operate on without the ADL that wrote it, too: it's just going to look different. You can do things like transfer data, and even walk over the data, without an ADL, because it's still all IPLD Data Model.

More Information

Learn briefly about where ADLs fit and what problems they solve in these sections:

Then, learn more about the details of what ADLs are and the boundaries of their interface in these sections:

Also, if you're wondering if ADLs can be composed: spoiler, yes :)

Beyond that, more detailed information is available in the other pages of this chapter, namely:


Where are ADLs in the big picture?

Read the docs about the Data Model first, if you haven't already. ADLs build upon the concepts that are introduced and standardized by the Data Model.

ADLs appear at a middle level of the stack. (You're definitely going to encounter codecs and the Data Model first.)

ADLs are also entirely optional parts of IPLD: they're useful, but they're not the first thing you need to implement if building a new IPLD library in a new language.

Codecs vs ADLs

Codecs take binary, illegible data, and turn it into IPLD Data Model. By contrast, ADLs take already-legible data -- e.g., data that's already been parsed by a codec -- and make it legible in a different way.

ADLs implicitly used a codec already (although ADLs are not tied to a codec; they just need to have some codec which handles serialization). Codecs do not use ADLs.

In general, if you can accomplish some goal with an ADL instead of a codec, you probably should -- you'll be interoperable with more things as a result. (See also the Getting Things Done document about this.)

ADLs can deal with multi-block data. This means they're suitable for solving a larger class of problems than a codec is.

Codecs tend to take considerably longer to develop than ADLs. Codecs require more standardization effort, and require agreeing on a multicodec indicator. ADLs can be somewhat more freely developed, because of their nature as an optional layer.

Schemas vs ADLs

Both Schemas and ADLs can be described as "lenses" for data, but they have different purposes and scopes. Schemas only allow very specific "lenses", are designed to be fast, and are mostly intended for structuralizing data and validating it. ADLs have a much broader scope: ADLs allow arbitrary plugins, can contain complex data transformations, and can even trigger multiple data load and store operations internally (as they do when used for sharding algorithms).

As described above, neither Schemas nor ADLs depend on the other. Both are optional parts of IPLD. Both can be used together or independently.

Schemas can be used to "signal" where to use ADLs in a large forest of data; see the Signalling page for more on that.

What ADLs Enable

ADLs are generally used to make some complex system simpler, or more legible; and, to unlock the ability of other IPLD features to apply over that data.

Because ADLs make complex data structures readable and writable as "just" a Node, it means all the features of IPLD that work over regular Nodes work over ADLs, too!

For example:

This reusability makes a ton of features possible for building systems with ADLs, and makes it work with a minimum of development effort.

In particular, the Selectors story is quite powerful, because it has no fallback. Having a Selector walk over the inner state of an unknown datastructure (let's take a HAMT as an example, though the principle is general) is only possible if you know the load factor of the state structure, or other specific details of its internal state. For many applications of Selectors -- especially, say, the user of Selectors to ask someone else on the internet to send you data that you don't already have -- this would make Selectors all but useless. However, by running Selectors over an ADL, things work out nicely. (You can see this principle working out in how Graphsync can fulfill transport of all the blocks necessary to traverse a UnixFS path -- just send the path itself, together with the instruction to look at the data with a UnixFS ADL, and the protocol will figure the rest out for you.)

How ADLs Work

ADLs require:

ADLs need an interface

... but often that interface is quite minor.

For example, in the go-ipld-prime definition of ADL, it's almost entirely simply the Node interface -- the same interface already used for regular Data Model data.

There may also be some interfaces required for the creation of the ADL value. (Starting to treat already-loaded nodes as an ADL is often called "reifying".)

The exact details of the interfaces will vary per implementation library; you'll need to consult the documentation of your library for more information.

ADLs use code

ADLs use code, and some sort of plugin system is needed in IPLD libraries to support this.

How exactly those plugin systems work, and what kind of format the code needs to be authored in, and exactly what interfaces need to be adhered to: these will all vary per IPLD library and the language the IPLD library is in.

(Someday, a system for portable ADL code would be neat. However, we currently consider that a research problem: some notes can be found in open-research/ADLs-we-can-autoexecute.)

ADLs need to be signalled

Because ADLs are optional lenses that you can choose (or not choose) to engage when processing data, it follows that an application needs to "signal" whether or not they want to use an ADL; where in the data they want to use it; and which ADL to actually use on that data.

We call this "the signalling problem".

This gets quite in depth and has many possible solutions, so we explore it in its own page: hop over to ADL signalling to learn more.

Can ADLs be Composed?

Yes! ADLs can be nested (an ADL can use another ADL inside itself), and multiple ADLs can be used in sequence when pathing (even without knowing about each other).

For example: the UnixFS ADL is actually composed of several smaller ADLs:

In this example, we see both sequential use of ADLs, and composition of ADLs. When a directory points at another directory or a file, that's sequential use of two or more ADLs (and sometimes different ADLs; the directory maps use one, and the file bytes use another). When the UnixFS pathing ADL is used, that one composes over the other directory ADLs, using them internally, and then decorating on more of its own logic as well.