Advanced Data Layouts
Advanced Data Layouts (or just Advanced Layouts, or ADLs for short) are a kind of plugin system for IPLD which is used when we want to present some data as if it were a Data Model Node, while actually storing it as a different Node (or as several Nodes!).
This is easiest to understand by example:
- sharded map example
- sharded bytes example
- encryption example
- link to unixfs "directory" story
You should probably read the doc about the Data Model first, if you haven't already. ADLs build upon the concepts that are introduced standardized by the Data Model.
- ADLs convert Data Model nodes into another node (or, when writing new data, provide an interface to go the other way: let the user act like they're creating a node, but in the background create several nodes, or a different structure, which stores that data).
- Codecs and ADLs compose smoothly -- Codecs can deserialize and serialize the Data Model data that is the "raw" "interior" content of an ADL.
- Schemas technically have nothing to do with ADLs...
- but, Schemas can be useful for signaling when ADLs should be used to handle data (more on that later);
- and in practice, ADL specifications often include a Schema which describes them, simply for clarity (and ADL implementations might choose use that Schema in their internal code, too).
- Traversals and pathing work transparently over ADLs (which is part of why ADLs exist and what makes them awesome in the first place)!
Because ADLs make complex data structures readable and writable as "just" a Node, it means all the features of IPLD that work over regular Nodes work over ADLs, too.
- Traversals and pathing work transparently over ADLs;
- that also includes Selectors working over ADLs;
- "IPLD Patch" tools (still forthcoming) work transprently over ADL;
- any kind of custom library functions you've written that work over Nodes? They'll "just work" with ADLs.
This reusability makes a ton of features possible for building systems with ADLs, and makes it work with a minimum of development effort.
In particular, the Selectors story is quite powerful, because it has no fallback. Having a Selector walk over the inner state of an unknown datastructure (let's take a HAMT as an example, though the principle is general) is only possible if you know the load factor of the state structure, or other specific details of its internal state. For many applications of Selectors -- especially, say, the user of Selectors to ask someone else on the internet to send you data that you don't already have -- this would make Selectors all but useless. However, by running Selectors over an ADL, things work out nicely.
(Note: some ADLs are read-only. Some are write-only. Sometimes there's a symmetrical pairing of implementations; sometimes there's not. Some need configuration on one direction, but not on the other. You'll need to consult the documentation and specs for individual ADLs to see what they support.)
- emphasis on one node as the result: whether it be map or list or bytes or etc, one.
- include concrete example of what kind of transformation you'd be better off doing with schema.
- clarify that without ADL code activated, the raw data can still be read and even traversed... just differently.
- clarify that codecs and ADLs compose, there's a clear layering there.
ADLs use code, and some sort of plugin system is needed in IPLD libraries to support this.
How exactly those plugin systems work, and what kind of format the code needs to be authored in, and exactly what interfaces need to be adhered to: these will all vary per IPLD library and the language the IPLD library is in.
(Someday, a system for portable ADL code would be neat. However, we currently consider that a research problem: some notes can be found in open-research/ADLs-we-can-autoexecute.)
We call this "the signaling problem".
In short: you don't.
Since the data composing the "raw", interior data of an ADL is just regular IPLD Data Model (it must be, after all, since it's produced by some Codec, which by definition produces data structures describable by the Data Model), then it follows that there's absolutely no way for this data to unambiguously indicate that it needs an ADL in order to be understood. If there was, it would imply that there's some kind of "reserved words" in the Data Model, which would violate some of our other central goals in IPLD, because it would mean some perfectly normal maps and lists would be invalid IPLD or gain magical meaning that they shouldn't; we don't want any of that.
So! Signaling must come from somewhere else.
There are a variety of valid options:
One useful system we have which can provide an answer to the signaling question are IPLD Schemas. Since Schemas are already a declarative way to talk about the structure of data, it's quite reasonable that they should also be able to talk about where the structure of data uses an ADL.
A page on Indicating ADLs with Schemas talks more about this.
However, you don't have to use IPLD Schemas if you want to use ADLs. Keep reading the next couple of sections for more alternatives that you can use to answer the signaling question.
- a remark should be present here on the interesting limitation about non-recursive descriptions being somewhat high-friction to reach with this mechanism. (although maybe this belongs in a separate deeper-diving doc in another page).
- discuss this
- link to the go-ipld-prime NodeReifier callback as an example of this
We have no currently active specifications for other forms of declarative signaling.
However, you can imagine making such a system yourself fairly easily: all that's necessary is to decide what that declarative format is that you want, and write a system that binds it to the relevant programmatic APIs of the IPLD libraries you use, and everything should work out from there.
Additional declarative signaling specifications may be something that is ratified into IPLD in the future. (If you'd like to drive this work, please feel free to get in touch!)
(Some systems have already done this in their own ways: for example, parts of the Filecoin Lotus project expose "paths" in their CLI which have an extension that is used in that application to signal where to engage ADLs. You can do things like this in your own applications, too! It's worth noting, however, that what the Filecoin Lotus project does here is not considered a well-specified IPLD behavior, and in fact contains several caveats which constrains what is valid data for that application to process to a range that is far narrower than what the IPLD Data Model specifies.)
- "running foreign code on somebody else's budget" is not something that happens at unbounded scale on public services
- availability in many languages/libraries, and authorship/maint effort implied -- it's better to use community-common things if you can
- similar to codecs in this regard
- reminder that schemas are usable on public infra (like e.g. on hosted IPLD Explorer tools), because they have predictable computation cost envelopes -- reminder to prefer doing things with a schema rather than an ADL if you can; don't reach for ADLs just because you want a funky fresh custom format