FEP-1311: Media Attachments

Warning

このFEPはまだ翻訳されていません。

ここから翻訳に協力することができます。

Summary

Media Attachments are ubiquitous in the Fediverse. My quick investigation into the explore tab on mastodon.social yields that about half the posts contain an image attachment. The mechanism for these is poorly documented. For example, it is not mentioned in ActivityPub.

My goal in this FEP is to document current usage, and issue recommendations on how to improve it. These recommendations are based on the support table Recommended Media Attachment Format available at FunFedi.dev.

For developers that enjoy making their keyboards smoke, I believe that the above link combined with the content of Testing should be enough to adapt their Fediverse applications. The other parts are meant for people, who which to improve the situation related to media attachments.

Basic format

We will discuss our basic suggested format with the following example of an image attachment.

{
    "type": "Image",
    "name": "A beautiful cow",
    "url": "http://pasture-one-actor/assets/cow.jpg",
    "width": 100,
    "height": 162,
    "mediaType": "image/jpeg",
    "digestMultibase": "zQmaeDPzhNL32WQZnnzB1H6QJWvvFNEHdViDB71yrxyXU1t",
    "size": 9045
}

There is a lot to say here, first how does this relate to communication in ActivityPub which is done by activity. For this consider the activity (taken from data.funfedi.dev) given by

{
  "@context": [
      "https://www.w3.org/ns/activitystreams",
      "https://www.w3.org/ns/credentials/v2",
      {
        "size": "https://joinpeertube.org/ns#size"
      }
    ],
  "type": "Create",
  "actor": "http://pasture-one-actor/actor",
  "to": [
    "http://akkoma/users/witch",
    "https://www.w3.org/ns/activitystreams#Public"
  ],
  "id": "http://pasture-one-actor/actor/S5Szzuugy50",
  "published": "2024-12-05T08:18:48Z",
  "object": {
    "type": "Note",
    "attributedTo": "http://pasture-one-actor/actor",
    "to": [
      "https://www.w3.org/ns/activitystreams#Public",
      "http://akkoma/users/witch"
    ],
    "id": "http://pasture-one-actor/actor/qDqgbPpNQPw",
    "published": "2024-12-05T08:18:48Z",
    "content": "Recommended Image Format",
    "attachment": [
      {
        "type": "Image",
        "name": "A beautiful cow",
        "url": "http://pasture-one-actor/assets/cow.jpg",
        "width": 100,
        "height": 162,
        "mediaType": "image/jpeg",
        "digestMultibase": "zQmaeDPzhNL32WQZnnzB1H6QJWvvFNEHdViDB71yrxyXU1t",
        "size": 9045
      }
    ]
  }
}

We note that the media attachment is contained in the array of attachment of the Note object. Furthermore, in difference to the activity and the object, there is no id property nor actor or attributedTo property. This is on purpose, as those are inherited from the object the media attachment is attached to. See Content Licensing for discussion about not having an attributedTo property.

In particular, one should emphasize that a media attachment not having an id is useful to signify that it is not useful as an object without the note, it is attached to.

Specifying basic properties

To use media attachments, an object MUST have an attachment property, whose value is an array containing objects. Furthermore, the contained objects MUST have a type property. For it to be a media attachment the type property MUST be Audio, Image, or Video. However, the attachment property MAY contain other form of attachments, e.g. FEP-0ea0: Payment Links.

This can be expressed as the json-schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "attachment": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "type": {
            "type": "string",
            "examples": ["Audio", "Image", "Video"]
          }
        },
        "required": ["type"]
      }
    }
  },
  "required": ["attachment"]
}

We now discuss the specific form a media attachment. In addition to type, a media attachment MUST also contain an url property providing the link to the media. The url property MAY also be an array, see Multiple Media Versions. However, this just to be future proof.

Furthermore, media attachments SHOULD contain a name property providing an alternative plain text description of the media object.

Again this can be represented as a json-schema.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "type": {
      "type": "string",
      "enum": ["Audio", "Image", "Video"]
    },
    "name": {
      "type": "string",
      "examples": ["A beautiful cow"]
    },
    "url": {
      "type": ["string", "array"],
      "examples": ["http://you.example/image.png"]
    }
  },
  "required": ["type", "url"]
}

Current state of type

Many current implementations use Document for media attachments. This has the distinct disadvantage to using Audio, Image, or Video that the parser needs to inspect more content than type to discover what type of attachment it is.

Let's write a bit of python to determine if something is a media attachment based on mediaType, discussed later.

def is_media(attachment:dict) -> bool:
    media_type = attachment.get("mediaType")
    if media_type is None:
        ... # handle error case

    main_type, _ = media_type.split("/", 1)

    return main_type in ["audio", "image", "video"]

This already is fairly complicated, but there is more. If one allows url to be an array, one needs a different check, so it turns into something like:

def is_media(attachment:dict) -> bool:
    url = attachment.get("url")
    if isinstance(url, list):
        return is_media(url[0])

    media_type = attachment.get("mediaType")
    if media_type is None:
        ... # handle error case

    main_type, _ = media_type.split("/", 1)

    return main_type in ["audio", "image", "video"]

There are more exceptions and poor configuration to treat, e.g. url could be an empty list, or the implementation could attach mediaType to the full object instead of the Link.

The goal of a specification needs to be to simplify the code that needs to be written, so we insist on people using the types Audio, Image, and Video for media attachments.

Properties of the linked file

In this section, we discuss properties related to the linked file. The linked file is retrieved either by performing a GET request on the value of url or on href of the Link objects if url is an array. Just to mention it, this requirement might change, once Authentication and Authorization is dealt with.

Let's look back at our original example of a media attachment

{
    "type": "Image",
    "name": "A beautiful cow",
    "url": "http://pasture-one-actor/assets/cow.jpg",
    "width": 100,
    "height": 162,
    "mediaType": "image/jpeg",
    "digestMultibase": "zQmaeDPzhNL32WQZnnzB1H6QJWvvFNEHdViDB71yrxyXU1t",
    "size": 9045
}

Here the linked file is given by a GET on http://pasture-one-actor/assets/cow.jpg and the result would be

A beautiful cow

The properties mediaType, digestMultibase, and size could be valid for any attached file, even a non media one, e.g. a text document. Let's quickly review them. mediaType is defined in the ActivityStreams Vocabulary. It describes the MIME type and tells us important information on how to render the file.

digestMultibase is defined here as part of Verifiable Credential Data Integrity. The encoding of a digest in multibase with multihash is somewhat different to the rest of multicodec, because one first has a byte to indicate the format, then another one to indicate the length. This means in particular that all digestMultibase using sha-256 will start with zQm, the z indicating base58 encoding. Checking the digest is important to ensure integrity. As media is often hosted off site using S3, this seems important. For another usage see Content Addressed Storage.

Finally size being the file size in bytes is borrowed from PeerTube. The size should tell us if we want to preload the media or not.

There is something missing in the file properties: access control, see the section Authentication and Authorization in the open questions below.

Specifying file properties

The creator of a media attachment SHOULD include the values of mediaType, digestMultibase, and size. The consumer of a media attachment SHOULD ensure integrity of the downloaded attachment based on digestMultibase, i.e. check the digest. The consumer of a media attachment SHOuLD decide based on size and mediaType the best way to consume the attachment.

size and mediaType become more relevant when multiple versions of the media attachment are provided. For example, this could mean that in one feeds one only sees the low quality video by default.

Properties of an image

We have now discussed all properties of our example document except for width and height. These properties are only relevant for an image and a video, but not for audio. Similarly, audio and video can have a duration, which images don't. Finally, Mastodon has introduced the additional properties

where at least focalPoint is user defined. There are a lot of other properties one can consider for media, e.g.

  • Where was the picture taken? e.g. location
  • What is the frame rate of the video? e.g. fps
  • Provide an album cover for audio?

In order to standardize these things further work is needed.

Multiple Media Versions

As it is currently not supported in the Fediverse, I will just give the basic example how to use multiple attached Links:

{
  "type": "Video",
  "name": "A beautiful cow eating",
  "url": [
    {
      "type": "Link",
      "size": 54373,
      "digest": "zQmSzK5qEe5tpjwGMhmjx9RvVoPkWhEmCwxP2s7wPMpKMoK",
      "width": 256,
      "height": 144,
      "href": "http://pasture-one-actor/assets/cow_eating.mp4",
      "mediaType": "video/mp4"
    },
    {
      "type": "Link",
      "size": 2271723,
      "digest": "zQme2X4rgWuRdmAtGGMSEbdoeRQ2NAL2VptcdRGTYDZbSKG",
      "width": 1920,
      "height": 1080,
      "href": "http://pasture-one-actor/assets/cow_eating_hd.mp4",
      "mediaType": "video/mp4"
    }
  ],
  "duration": "PT3S"
}

As the example shows, this is useful to attach both a low quality version (54kb) and a high quality one (2.2MB) of a video.

We think that supporting this will open the door for richer applications.

Testing

By using json-schema, one can validate some level of correctness of generated media attachments. Relevant schemas are available at Fediverse schemas for media attachments. They can be combined into a feature test using Gherkin, see Media Format.

If you wish to validate everything, including digest, you can use the examples provided at FunFedi.dev.

Open Question

This section is essentially a todo list for the community on stuff that should be fixed, but isn't yet.

Content Licensing

The picture in the examples was created based on this picture available for free on pixabay by photographer derekmuller. Unfortunately, the current standards to not let me attach this information to my media object.

One could now say that this could be solved with just using the attributedTo property. Unfortunately, this has a lot of drawbacks. For example derekmuller is not an ActivityPub actor. Also attributing my cropped low resolution image to him, might be something he does not appreciate. Finally, just attributing this image is probably not enough, one should also inform people on how it is licensed.

See FEP-c118 and its discussions for more on the topic.

Authentication and Authorization

Currently, image links must be accessible without any form of authorization. This is due while communication between a user and their server requiring authentication and between servers requiring authentication, images are often stored on third party services, e.g. S3, thus adding authentication is hard.

For some approaches to resolve this see this Fediverse discussion.

One approach to achieve authentication and authorization easily with existing technologies would be Bearcaps.

For a different approach see also Binary Fediverse transport.

Content addressed storage

Storing media is costly. It is thus important to avoid duplication. By having a digest for all media through the digestMultibase property, we can use this to index our media storage. This means that before downloading a file, we can check if we already have it.

Mixed media content

Consider posting a song, e.g. something from the brat summer, then you might want to attach the album cover, e.g. an image featuring the color #8ACE00. Maybe you will also want to attach some lyrics. This means that your media content contains three parts of separate media type.

One might want to extend the schema for media attachments to convey this information.

Binary Fediverse transport

A failure of ActivityPub is that it restricts transport to be JSON. ActivityPub thus forces people to use external means, e.g. download the file, to convey media content.

One could solve some problems, e.g. Authentication and Authorization, by just allowing transport of binary blobs. This would require an extension of the wire format.

Allowing for messages to contain binary blobs would also for sharing media via thick clients.

参考文献

Posts

Apparently Streams has some mechanism for protecting attachments. Media URLs in non-public posts look like this: https://{domain}/photo/{filename}.jpg?token={token}

IIRR at least in Hubzilla that token is just part of OpenWebAuth's "magic authentication". Where I guess the token contains info about which instance(s?) to contact in order to verify your identity. The audience is kept in the media server(s) database and sync'ed between clones.

著作権

CC0 1.0 ユニバーサル (CC0 1.0) パブリック ドメイン

法律で認められる範囲において、この Fediverse 拡張提案の著者は、この作品に対するすべての著作権および関連する権利または隣接する権利を放棄しています。