# CLAP Format Specification - Status: DRAFT - Document revision: 0.0.1 - Last updated: Feb 6th, 2024 - Author(s): Julian BILCKE (@flngr) ## BEFORE YOU READ The CLAP format spec is experimental and not finished yet! There might be inconsistencies, unnecessary redundancies or blatant omissions. ## What are CLAP files? The CLAP format (.clap) is a file format designed for AI video projects. It preserves prompts and assets into the same container, making it easier to share an AI video project between different people or applications. ## Structure A CLAP is an array of objects serialized into a YAML text string, then finally compressed using gzip to a binary file. The file extension is `.clap` The mime type is `application/x-yaml` There can be 5 different types of objects: - one HEADER - one METADATA - zero, one or more MODEL(s) - zero, one or more SCENE(s) - zero, one or more SEGMENT(s) This can be represented in javascript like this: ```javascript [ clapHeader, // one metadata object clapMeta, // one metadata object ...clapModels, // optional array of models ...clapScenes, // optional array of scenes ...clapSegments // optional array of segments ] ``` ## Header The HEADER provides information about how to decode a CLAP. Knowing in advance the number of models, scenes and segments helps the decoder parsing the information, and in some implementation, help with debugging, logging, and provisioning memory usage. However in the future, it is possible that a different scheme is used, in order to support streaming. Either by recognizing the shape of each object (fields), or by using a specific field eg. a `_type`. ```typescript { // used to know which format version is used. // CLAP is still in development and the format is not fully specified yet, // during the period most .clap file will have the "clap-0" format format: "clap-0" numberOfModels: number // integer numberOfScenes: number // integer numberOfSegments: number // integer } ``` ## Metadata ```typescript { id: string // "" title: string // "project title" description: string // "project description" licence: string // "information about licensing" // this provides information about the image ratio // this might be removed in the final spec, as this // can be re-computed from width and height orientation: "landscape" | "vertical" | "square" // the suggested width and height of the video // note that this is just an indicator, // and might be superseeded by the application reading the .clap file width: number // integer between 256 and 8192 (value in pixels) height: number // integer between 256 and 8192 (value in pixels) // name of the suggested video model to use // note that this is just an indicator, // and might be superseeded by the application reading the .clap file defaultVideoModel: string // additional prompt to use in the video generation // this helps adding some magic touch and flair to the videos, // but perhaps the field should be renamed extraPositivePrompt: string // the screenplay (script) of the video screenplay: string } ``` ## Models Before talking about models, first we should describe the concept of entity: in a story, an entity is something (person, place, vehicle, animal, robot, alien, object) with a name, a description of the appearance, an age, mileage or quality, an origin, and so on. An example could be "a giant magical school bus, with appearance of a cat with wheels, and which talks" The CLAP model would be an instance (an interpretation) of this entity, where we would assign it an identity: - a name and age - a visual style (a photo of the magic school bus cat) - a voice style - and maybe other things eg. an origin or background story As you can see, it can be difficult to create clearly separated categories, like "vehicule", "character", or "location" (the magical cat bus could turn into a location in some scene, a speaking character in another etc) This is why there is a common schema for all models: ```typescript { id: string category: ClapSegmentCategory triggerName: string label: string description: string author: string thumbnailUrl: string seed: number assetSourceType: ClapAssetSource assetUrl: string age: number gender: ClapModelGender region: ClapModelRegion appearance: ClapModelAppearance voiceVendor: ClapVoiceVendor voiceId: string } ``` TO BE CONTINUED (you can read "./types.ts" for more information)