Create Chat Completion

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

messages

Messages · array

required

ChatCompletionDeveloperMessageParam
ChatCompletionSystemMessageParam
ChatCompletionUserMessageParam
ChatCompletionAssistantMessageParam
ChatCompletionToolMessageParam
ChatCompletionFunctionMessageParam
CustomChatCompletionMessageParam

Show child attributes

model

string | null

frequency_penalty

number | null

default:0

logit_bias

object | null

Show child attributes

logprobs

boolean | null

default:false

top_logprobs

integer | null

default:0

max_tokens

integer | null

deprecated

max_completion_tokens

integer | null

default:1

presence_penalty

number | null

default:0

response_format

object | null

ResponseFormat
StructuralTagResponseFormat

Show child attributes

seed

integer | null

Required range: -9223372036854776000 <= x <= 9223372036854776000

stop

default:[]

stream

boolean | null

default:false

stream_options

object | null

Show child attributes

temperature

number | null

top_p

number | null

tools

ChatCompletionToolsParam · object[] | null

Show child attributes

tool_choice

default:none

Allowed value: "none"

parallel_tool_calls

boolean | null

default:false

user

string | null

best_of

integer | null

use_beam_search

boolean

default:false

top_k

integer | null

min_p

number | null

repetition_penalty

number | null

length_penalty

number

default:1

stop_token_ids

integer[] | null

include_stop_str_in_output

boolean

default:false

ignore_eos

boolean

default:false

min_tokens

integer

default:0

skip_special_tokens

boolean

default:true

spaces_between_special_tokens

boolean

default:true

truncate_prompt_tokens

integer | null

Required range: x >= 1

prompt_logprobs

integer | null

allowed_token_ids

integer[] | null

bad_words

string[]

echo

boolean

default:false

If true, the new message will be prepended with the last message if they belong to the same role.

add_generation_prompt

boolean

default:true

If true, the generation prompt will be added to the chat template. This is a parameter used by chat template in tokenizer config of the model.

continue_final_message

boolean

default:false

If this is set, the chat will be formatted so that the final message in the chat is open-ended, without any EOS tokens. The model will continue this message rather than starting a new one. This allows you to "prefill" part of the model's response for it. Cannot be used at the same time as add_generation_prompt.

add_special_tokens

boolean

default:false

If true, special tokens (e.g. BOS) will be added to the prompt on top of what is added by the chat template. For most models, the chat template takes care of adding the special tokens so this should be set to false (as is the default).

documents

Documents · object[] | null

A list of dicts representing documents that will be accessible to the model if it is performing RAG (retrieval-augmented generation). If the template does not support RAG, this argument will have no effect. We recommend that each document should be a dict containing "title" and "text" keys.

Show child attributes

chat_template

string | null

A Jinja template to use for this conversion. As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.

chat_template_kwargs

object | null

Additional keyword args to pass to the template renderer. Will be accessible by the chat template.

mm_processor_kwargs

object | null

Additional kwargs to pass to the HF processor.

guided_json

If specified, the output will follow the JSON schema.

guided_regex

string | null

If specified, the output will follow the regex pattern.

guided_choice

string[] | null

If specified, the output will be exactly one of the choices.

guided_grammar

string | null

If specified, the output will follow the context free grammar.

structural_tag

string | null

If specified, the output will follow the structural tag schema.

guided_decoding_backend

string | null

If specified, will override the default guided decoding backend of the server for this specific request. If set, must be either 'outlines' / 'lm-format-enforcer'

guided_whitespace_pattern

string | null

If specified, will override the default whitespace pattern for guided json decoding.

priority

integer

default:0

The priority of the request (lower means earlier handling; default: 0). Any priority other than 0 will raise an error if the served model does not use priority scheduling.

request_id

string

The request_id related to this request. If the caller does not set it, a random_uuid will be generated. This id is used through out the inference process and return in response.

logits_processors

Logits Processors · array

A list of either qualified names of logits processors, or constructor objects, to apply when sampling. A constructor is a JSON object with a required 'qualname' field specifying the qualified name of the processor class/factory, and optional 'args' and 'kwargs' fields containing positional and keyword arguments. For example: {'qualname': 'my_module.MyLogitsProcessor', 'args': [1, 2], 'kwargs': {'param': 'value'}}.

Show child attributes

return_tokens_as_token_ids

boolean | null

If specified with 'logprobs', tokens are represented as strings of the form 'token_id:{token_id}' so that tokens that are not JSON-encodable can be identified.

cache_salt

string | null

If specified, the prefix cache will be salted with the provided string to prevent an attacker to guess prompts in multi-user environments. The salt should be random, protected from access by 3rd parties, and long enough to be unpredictable (e.g., 43 characters base64-encoded, corresponding to 256 bit). Not supported by vLLM engine V0.

kv_transfer_params

object | null

KVTransfer parameters used for disaggregated serving.

vllm_xargs

object | null

Additional request parameters with string or numeric values, used by custom extensions.

Show child attributes

Response

Successful Response

model

string

required

choices

ChatCompletionResponseChoice · object[]

required

Show child attributes

usage

object

required

Show child attributes

string

object

string

default:chat.completion

Allowed value: "chat.completion"

created

integer

service_tier

enum<string> | null

Available options:

auto,

default,

flex,

scale,

priority

system_fingerprint

string | null

prompt_logprobs

Prompt Logprobs · array

Show child attributes

kv_transfer_params

object | null

KVTransfer parameters.

Serverless RL

API Reference

Authorizations

Body

Response