Skip to content

Generating Text via the API

You can generate a text completion for a given text prompt by posting an HTTP request to the complete endpoint corresponding to the desired language model to use. The request contains the input text, called a prompt and various parameters controlling the generation. For authentication, you must include your API key in the request headers. A complete response contains the tokenized prompt, the generated text(s), called completion(s), and various metadata.

The request and response specifications are documented in full below. See the following example for a complete request to j1-large and corresponding response:

import requests

requests.post(
    "https://api.ai21.com/studio/v1/j1-large/complete",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "prompt": "Life is like", 
        "numResults": 1, 
        "maxTokens": 8, 
        "stopSequences": ["."],
        "topKReturn": 0,
        "temperature": 0.0
    }
)
fetch("https://api.ai21.com/studio/v1/j1-large/complete", {
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
      "prompt": "Life is like",
      "numResults": 1,
      "maxTokens": 8,
      "stopSequences":[],
      "topKReturn": 0,
      "temperature": 0.0,
  }),
  method: "POST"
});
curl https://api.ai21.com/studio/v1/j1-large/complete \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -X POST \
    --data-raw '{
        "prompt": "Life is like", 
        "numResults": 1, 
        "maxTokens": 8, 
        "stopSequences": ["."],
        "topKReturn": 0,
        "temperature": 0.0
    }' 
{
  "completions": [
    {
      "data": {
        "text": " a box of chocolates",
        "tokens": [
          {
            "generatedToken": {"logprob": -1.4022408723831177, "token": "▁a▁box▁of"},
            "textRange": {"end": 9, "start": 0},
            "topTokens": null
          },
          {
            "generatedToken": {"logprob": -0.15134188532829285, "token": "▁chocolates"},
            "textRange": {"end": 20, "start": 9},
            "topTokens": null
          }, 
          {
            "generatedToken": {"logprob": -1.081266164779663, "token": "."},
            "textRange": {"end": 21, "start": 20},
            "topTokens": null
          }
        ]
      },
      "finishReason": {"reason": "stop", "sequence": "."}
    }
  ],
  "id": "1234",
  "prompt": {
    "text": "Life is like",
    "tokens": [
      {
        "generatedToken": {"logprob": -9.73992919921875, "token": "▁Life▁is"},
        "textRange": {"end": 7, "start": 0},
        "topTokens": null
      },
      {
        "generatedToken": {"logprob": -3.7175867557525635, "token": "▁like"},
        "textRange": {"end": 12, "start": 7},
        "topTokens": null
      }
    ]
  }
}

Request

URL

Generate a completion by performing a POST request to the following URL:

https://api.ai21.com/studio/v1/{model}/complete

Where model is either j1-large, j1-grande, j1-jumbo or a custom model name corresponding to one of your custom models.

Authorization Headers

Your requests must include your personal API key in an Authorization header, as follows:

Authorization: Bearer YOUR_API_KEY

You can find your personal API key in the account page. Your API key must be kept secret. Do not share it with others or expose it in any data, configuration, or code accessible to clients. You can revoke an existing API key and replace it with a new key in the account page.

Payload

The request payload should carry the following fields:


prompt | string

The text which the model is requested to continue. Required.

The length of the text should be no more than 2047 tokens.


numResults | 1 <= integer <= 16

Number of completions to sample and return. Optional, default = 1.

A value greater than 1 is meaningful only in case of non-greedy decoding, i.e. temperature > 0.


maxTokens | integer >= 0

The maximum number of tokens to generate per result. Optional, default = 16.

If no stopSequences are given, generation is stopped after producing maxTokens.


minTokens | integer >= 0

The minimum number of tokens to generate per result. Optional, default = 0.

If stopSequences are given, they are ignored until minTokens are generated.


temperature | 0 <= float <= 5.0

Modifies the distribution from which tokens are sampled. Optional, default = 1.0.

Setting temperature to 1.0 samples directly from the model distribution. Lower (higher) values increase the chance of sampling higher (lower) probability tokens. A value of 0 essentially disables sampling and results in greedy decoding, where the most likely token is chosen at every step.


topP | 0 <= float <= 1.0

Sample tokens from the corresponding top percentile of probability mass. Optional, default = 1.0.

For example, a value of 0.9 will only consider tokens comprising the top 90% probability mass.


stopSequences | list of strings

Stops decoding if any of the strings is generated. Optional.

For example, to stop at a comma or a new line use [".", "\n"]. The decoded result text will not include the stop sequence string, but it will be included in the raw token data, which can also continue beyond the stop sequence if the sequence ended in the middle of a token. The sequence which triggered the termination will be included in the finishReason of the response.


topKReturn | 0 <= integer <= 64

Return the top-K alternative tokens. Optional, default = 0.

When using a non-zero value, the response includes the string representations and logprobs for each of the top-K alternatives at each position, in the prompt and in the completions.


presencePenalty | PenaltyData
Applies a fixed bias against generating tokens that appeared at least once in the prompt or in the completion. Read more about repetition penalties. Optional. No penalty is applied by default.

countPenalty | PenaltyData
Applies a bias against generating tokens that appeared in the prompt or in the completion, proportional to the number of respective appearances. Read more about repetition penalties. Optional. No penalty is applied by default.

frequencyPenalty | PenaltyData
Applies a bias against generating tokens that appeared in the prompt or in the completion, proportional to the frequency of respective appearances in the text. Read more about repetition penalties. Optional. No penalty is applied by default.

logitBias | dictionary

Adjust the probability of specific tokens being generated. Optional.

Pass a dictionary mapping from strings to floats, where the strings are text representations of the tokens and the floats are the biases themselves. A positive bias increases generation probability for a given token and a negative bias decreases it. Read more about logit bias

PenaltyData

Each repetition penalty is characterized by a PenaltyData data structure containing the following fields:


scale | float

Controls the magnitude of the penalty. Required.

A positive penalty value implies reducing the probability of repetition. Larger values correspond to a stronger bias against repetition.


applyToWhitespaces | boolean

Apply the penalty whitespaces and newlines. Optional, default=True.

Determines whether the penalty is applied to the following tokens:

'▁', '▁▁', '▁▁▁▁', '<|newline|>'

applyToPunctuations | boolean

Apply the penalty to punctuations. Optional, default=True.

Determines whether the penalty is applied to tokens containing punctuation characters and whitespaces, such as ; , !!! or ▁\\[[@.


applyToNumbers | boolean

Apply the penalty to numbers. Optional, default=True.

Determines whether the penalty is applied to purely-numeric tokens, such as 2022 or 123. Tokens that contain numbers and letters, such as 20th, are not affected by this parameter.


applyToStopwords | boolean

Apply the penalty to stop words. Optional, default=True.

Determines whether the penalty is applied to tokens that are NLTK English stopwords or multi-word combinations of these words, such as are , nor and ▁We▁have.


applyToEmojis | boolean

Exclude emojis from the penalty. Optional, default=True.

Determines whether the penalty is applied to any of approximately 650 common emojis in the Jurassic-1 vocabulary.

Response

The response is a nested data structure as described below. At its top level, the response has the following fields:


id
A unique string id for the processed request. Repeated identical requests get different ids.

prompt

The prompt, including the raw text, the tokens with their logprobs and the top-K alternative tokens at each position, if requested.

Has two nested fields:


completions

List of completions, including raw text, tokens and logprobs. The number of completions corresponds to requested numResults.

Each completions has two nested fields:

  • data, containing text (string) and tokens (list of TokenData) for the completion.
  • finishReason, a nested data structure describing the reason generation was terminated in this completion.

TokenData

Both the prompt and each of the completions provide lists of TokenData, where each entry describes a token and, if requested, its top-K alternatives. An instance of TokenData contains the following fields:


generatedToken

Has two nested fields:

  • token - the string representation of the token.
  • logprob - the predicted log probability of the token (float).

topTokens

A list of the top K alternative tokens for this position, sorted by probability, according to the topKReturn request parameter, or null if topKReturn=0.

Each token in the list has a token (string) field and a logprob (float) field.


textRange
The start and end offsets of this token in the decoded text string.

Special Tokens and Whitespaces

The or \u2581 character in the tokens is used by our tokenizer to substitute a single whitespace or tab symbol. Sequences of 2 and 4 consecutive spaces (either regular whitespaces or tabs) have their own tokens, ▁▁ and ▁▁▁▁ respectively.

Note that since tokenization adds a dummy space at the start of each line for consistency, the result text is not simply a concatenation of all tokens with replaced with a space. For example:

>>> res = requests.post("...", json={"prompt": "This is the 1st line\nThis is the 2nd line", 
                                     "temperature": 0, "maxTokens": 16})
>>> res.status_code
200
>>> data = res.json()
>>> data['completions'][0]['data']['text']
'\nThis is the 3rd line\nThis is the 4th line\nThis is the 5th line\n'
>>> tokens = [t['generatedToken']['token'] for t in data['completions'][0]['data']['tokens']]
>>> "".join(tokens)
'<|newline|>▁This▁is▁the▁3rd▁line<|newline|>▁This▁is▁the▁4th▁line<|newline|>▁This▁is▁the▁5th▁line<|newline|>'
>>> "".join(tokens).replace("▁"," ").replace("<|newline|>", "\n")
'\n This is the 3rd line\n This is the 4th line\n This is the 5th line\n'

Each token's textRange field can be used to map it to its corresponding span in the result text. Note that the text field of the prompt in the response may differ from the text sent in the request, if it contains special symbols which behave differently after tokenization. In this case the textRange fields always refer to the text in the response.

Repetition penalties

Repetition penalties can be used to counteract the model's tendency to repeat prompt text verbatim and/or get stuck in a loop. This is accomplished by adjusting the token probabilities at each generation step, such that tokens that already appeared in the text (either in the prompt or in the completion) are less likely to be generated again.

There are three kinds of repetition penalties: presencePenalty, countPenalty and frequencyPenalty. One or more penalty can be used, and the magnitude of each can be controlled independently via their respective API parameters (not available in the web playground yet). Reasonable scale values to explore are 0-5 for the presence and count penalties, and 0-500 for the frequency penalty.

The following example introduces all three penalties simultaneously (though each individually would have sufficed to prevent repetition):

>>> resp = requests.post(
    "https://api.ai21.com/studio/v1/j1-large/complete",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "prompt": "Hi there\nHi There\nHi there\n",
        "numResults": 1,
        "maxTokens": 10,
        "temperature": 0.0,
        "presencePenalty": {'scale': 5.0},
        "countPenalty": {'scale': 2.0},
        "frequencyPenalty": {'scale': 42.7},
    }
)
>>> resp.json()['completions'][0]['data']['text']
'I am a professional writer with 5 years of experience. I have completed'

In addition to controlling the penalty scale, the API allows toggling penalties on and off for five special categories of tokens: whitespaces (including newlines), punctuations, numbers, stopwords (including multi-word combinations of stopwords) and emojis. For example:

>>> resp = requests.post(
    "https://api.ai21.com/studio/v1/j1-large/complete",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "prompt": "123123123", 
        "maxTokens": 10,
        "temperature": 0
    }
)
>>> print(resp.json()['completions'][0]['data']['text'])  # The model repeats '123' as expected
123123123123123123123123123123

>>> resp = requests.post(
    "https://api.ai21.com/studio/v1/j1-large/complete",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "prompt": "123123123", 
        "maxTokens": 10,
        "temperature": 0,
        "presencePenalty": {
            "scale": 5
        }
    }
)
>>> print(resp.json()['completions'][0]['data']['text'])  # Presence penalty prevents repetition

What is the next term in -1, -2

>>> resp = requests.post(
    url,
    headers=headers,
    json={
        "prompt": "123123123", 
        **params,
        "presencePenalty": {
            "scale": 5,
            "applyToNumbers": False  # The default is True
        }
    }
)
>>> print(resp.json()['completions'][0]['data']['text'])  # Unless we exclude numbers from the penalty
123123123123123123123123123123

Logit bias

Logit biases can be used to promote or suppress the generation of specific tokens. This is accomplished by adding a bias term to each token's respective logits, where a positive bias increases generation probability a negative bias decreases it.

Note that logit bias operates at the token level, so you must refer to valid tokens in the Jurassic-1 vocabulary, otherwise the API returns an error. Watch out for whitespaces which are replaced with a special underscore character in our string representation of tokens (see above).

The following example introduces a large negative bias to avoid generating the expected continuation "a box of":

>>> resp = requests.post(
    "https://api.ai21.com/studio/v1/j1-large/complete",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "prompt": "Life is like",
        "numResults": 1,
        "maxTokens": 8,
        "topKReturn": 0,
        "temperature": 0.0,
        "logitBias": {"▁a▁box▁of": -100.0}
    }
)
>>> resp.json()['completions'][0]['data']['text']
' riding a bicycle. To keep your balance, you must'