You are a helpful assistant that classifies AI models and returns JSON descriptions. Here's the model to classify: ## Basic model info Model name: meta-llama-3-8b-instruct Model description: An 8 billion parameter language model from Meta, fine tuned for chat completions ## Model inputs - top_k: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering). (integer) - top_p: A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751). (number) - prompt: Prompt (string) - max_tokens: The maximum number of tokens the model should generate as output. (integer) - min_tokens: The minimum number of tokens the model should generate as output. (integer) - temperature: The value used to modulate the next token probabilities. (number) - prompt_template: Prompt template. The string `{prompt}` will be substituted for the input prompt. If you want to generate dialog output, use this template as a starting point and construct the prompt string manually, leaving `prompt_template={prompt}`. (string) - presence_penalty: Presence penalty (number) - frequency_penalty: Frequency penalty (number) ## Model output schema { "type": "array", "items": { "type": "string" }, "title": "Output", "x-cog-array-type": "iterator", "x-cog-array-display": "concatenate" } If the input or output schema includes a format of URI, it is referring to a file. ## Example inputs and outputs Use these example outputs to better understand the types of inputs the model accepts, and the types of outputs the model returns: Example 1: Input: top_p: 0.95 prompt: Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed? temperature: 0.7 system_prompt: You are a helpful assistant length_penalty: 1 max_new_tokens: 512 stop_sequences: <|end_of_text|>,<|eot_id|> prompt_template: |+ <|begin_of_text|><|start_header_id|>system<|end_header_id|> {system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|> {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|> presence_penalty: 0 Output: [ "The", " number", " of", " parameters", " in", " a", " neural", " network", " can", " impact", " its", " speed", ",", " but", " it", "'s", " not", " the", " only", " factor", ". ", "In", " general", ",", " a", " larger", " number", " of", " parameters", " can", " lead", " to", ": ", "1", ".", " Increased", " computational", " complexity", ":", " More", " parameters", " mean", " more", " calculations", " are", " required", " to", " process", " the", " data", ". ", "2", ".", " Increased", " memory", " requirements", ":", " Larger", " models", " require", " more", " memory", " to", " store", " their", " parameters", ",", " which", " can", " impact", " system", " performance", ". ", "However", ",", " it", "'s", " worth", " noting", " that", " the", " relationship", " between", " the", " number", " of", " parameters", " and", " speed", " is", " not", " always", " linear", ".", " Other", " factors", ",", " such", " as", ": ", "*", " Model", " architecture", " ", "*", " Optim", "izer", " choice", " ", "*", " Hyper", "parameter", " tuning", " ", "can", " also", " impact", " the", " speed", " of", " a", " neural", " network", ". ", "In", " the", " case", " of", " Johnny", " and", " Tommy", ",", " it", "'s", " difficult", " to", " say", " which", " one", "'s", " model", " will", " be", " faster", " without", " more", " information", " about", " the", " models", " themselves", "." ] ## Task Classification Based on the information above, please classify the model into one of the following tasks: - any-to-any: - audio-classification: Audio classification is the task of assigning a label or class to a given audio. It can be used for recognizing which command a user is giving or the emotion of a statement, as well as identifying a speaker. - audio-to-audio: Audio-to-Audio is a family of tasks in which the input is an audio and the output is one or multiple generated audios. Some example tasks are speech enhancement and source separation. - audio-text-to-text: - automatic-speech-recognition: Automatic Speech Recognition (ASR), also known as Speech to Text (STT), is the task of transcribing a given audio to text. It has many applications, such as voice user interfaces. - depth-estimation: Depth estimation is the task of predicting depth of the objects present in an image. - document-question-answering: Document Question Answering (also known as Document Visual Question Answering) is the task of answering questions on document images. Document question answering models take a (document, question) pair as input and return an answer in natural language. Models usually rely on multi-modal features, combining text, position of words (bounding-boxes) and image. - visual-document-retrieval: - feature-extraction: Feature extraction is the task of extracting features learnt in a model. - fill-mask: Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in. - graph-ml: undefined - image-classification: Image classification is the task of assigning a label or class to an entire image. Images are expected to have only one class for each image. Image classification models take an image as input and return a prediction about which class the image belongs to. - image-feature-extraction: Image feature extraction is the task of extracting features learnt in a computer vision model. - image-segmentation: Image Segmentation divides an image into segments where each pixel in the image is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation. - image-to-image: Image-to-image is the task of transforming an input image through a variety of possible manipulations and enhancements, such as super-resolution, image inpainting, colorization, and more. - image-text-to-text: Image-text-to-text models take in an image and text prompt and output text. These models are also called vision-language models, or VLMs. The difference from image-to-text models is that these models take an additional text input, not restricting the model to certain use cases like image captioning, and may also be trained to accept a conversation as input. - image-to-text: Image to text models output a text from a given image. Image captioning or optical character recognition can be considered as the most common applications of image to text. - image-to-video: undefined - keypoint-detection: Keypoint detection is the task of identifying meaningful distinctive points or features in an image. - mask-generation: Mask generation is the task of generating masks that identify a specific object or region of interest in a given image. Masks are often used in segmentation tasks, where they provide a precise way to isolate the object of interest for further processing or analysis. - multiple-choice: undefined - object-detection: Object Detection models allow users to identify objects of certain defined classes. Object detection models receive an image as input and output the images with bounding boxes and labels on detected objects. - video-classification: Video classification is the task of assigning a label or class to an entire video. Videos are expected to have only one class for each video. Video classification models take a video as input and return a prediction about which class the video belongs to. - other: undefined - question-answering: Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. Some question answering models can generate answers without context! - reinforcement-learning: Reinforcement learning is the computational approach of learning from action by interacting with an environment through trial and error and receiving rewards (negative or positive) as feedback - robotics: undefined - sentence-similarity: Sentence Similarity is the task of determining how similar two texts are. Sentence similarity models convert input texts into vectors (embeddings) that capture semantic information and calculate how close (similar) they are between them. This task is particularly useful for information retrieval and clustering/grouping. - summarization: Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text. - table-question-answering: Table Question Answering (Table QA) is the answering a question about an information on a given table. - table-to-text: undefined - tabular-classification: Tabular classification is the task of classifying a target category (a group) based on set of attributes. - tabular-regression: Tabular regression is the task of predicting a numerical value given a set of attributes. - tabular-to-text: undefined - text-classification: Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness. - text-generation: Generating text is the task of generating new text given another text. These models can, for example, fill in incomplete text or paraphrase. - text-ranking: Text Ranking is the task of ranking a set of texts based on their relevance to a query. Text ranking models are trained on large datasets of queries and relevant documents to learn how to rank documents based on their relevance to the query. This task is particularly useful for search engines and information retrieval systems. - text-retrieval: undefined - text-to-image: Text-to-image is the task of generating images from input text. These pipelines can also be used to modify and edit images based on text prompts. - text-to-speech: Text-to-Speech (TTS) is the task of generating natural sounding speech given text input. TTS models can be extended to have a single model that generates speech for multiple speakers and multiple languages. - text-to-audio: undefined - text-to-video: Text-to-video models can be used in any application that requires generating consistent sequence of images from text. - text2text-generation: undefined - time-series-forecasting: undefined - token-classification: Token classification is a natural language understanding task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. NER models could be trained to identify specific entities in a text, such as dates, individuals and places; and PoS tagging would identify, for example, which words in a text are verbs, nouns, and punctuation marks. - translation: Translation is the task of converting text from one language to another. - unconditional-image-generation: Unconditional image generation is the task of generating images with no condition in any context (like a prompt text or another image). Once trained, the model will create images that resemble its training data distribution. - video-text-to-text: Video-text-to-text models take in a video and a text prompt and output text. These models are also called video-language models. - visual-question-answering: Visual Question Answering is the task of answering open-ended questions based on an image. They output natural language responses to natural language questions. - voice-activity-detection: undefined - zero-shot-classification: Zero-shot text classification is a task in natural language processing where a model is trained on a set of labeled examples but is then able to classify new examples from previously unseen classes. - zero-shot-image-classification: Zero-shot image classification is the task of classifying previously unseen classes during training of a model. - zero-shot-object-detection: Zero-shot object detection is a computer vision task to detect objects and their classes in images, without any prior training or knowledge of the classes. Zero-shot object detection models receive an image as input, as well as a list of candidate classes, and output the bounding boxes and labels where the objects have been detected. - text-to-3d: Text-to-3D models take in text input and produce 3D output. - image-to-3d: Image-to-3D models take in image input and produce 3D output. ## Output format Return a JSON object with the following fields: - summary: A short summary of what the model does in 10 words or less. This should not be a sales pitch. - inputTypes: An array of the types of inputs the model accepts, like "text", "image", "audio", etc. - outputTypes: An array of the types of outputs the model returns, like "text", "image", "audio", etc. - task: The task the model performs. This should be one of the Hugging Face task names. Do not include any other text in your response. Do not explain your reasoning. Just return the JSON object. No code fencing. No markdown. No backticks. No triple backticks. No code blocks.