Model Card

The model card is made to display quick knowledge on the various model parameters.

Model Card Display:

The model card is displayed when you click on a model in Serverless, it appears on the right hand side and provides valuable information to help you understand exactly what model you are working with. This is useful when other vendors don't specifically tell you the size, quantization, context length, etc.

IMAGE HERE

Information Displayed in the Model Card:

  • Model Author: This is the model family, for example Llama, DeepSeek, Qwen, etc

    • Example: Qwen

  • Model Class: This is the specific model of it's author.

    • Example: Llama 3.3 70b

  • Model Size: This is the parameter size of the model, In the context of machine learning models, especially neural networks, parameters are numerical values that the model learns and adjusts during training. A higher number of parameters generally allows a model to represent more complex relationships within the data and potentially achieve better performance on tasks.

    • Example: 405b

  • Tags: These are ways to quickly identify qualities of the model for instance Visual, Reasoning, Embedding

    • Example: Visual

  • Context Length: The context window (or “context length”) of a large language model (LLM) is the amount of text, in tokens, that the model can consider or “remember” at any one time. A larger context window enables an AI model to process longer inputs and incorporate a greater amount of information into each output.

    • Example: 32k

  • Cost: This is the cost per million tokens. Currently we charge the same for both input and output tokens.

    • Example: $1.00/M

  • Quantization: Model quantization is a technique in machine learning where a model's parameters, like weights and activations, are converted from high-precision floating-point representations (typically 32-bit) to lower-precision formats (like 8-bit integers)

    • Example: FP8

Last updated