tuneapi.apis package
Submodules
tuneapi.apis.model_anthropic module
Connect to the Anthropic API to use Claude series of LLMs
- class tuneapi.apis.model_anthropic.Anthropic(id: str | None = 'claude-3-haiku-20240307', base_url: str = 'https://api.anthropic.com/v1/messages', api_token: str | None = None, extra_headers: Dict[str, str] | None = None)
Bases:
ModelInterface- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the blocking function to block chat with the model
- async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the async function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the blocking function to chat with the model in a distributed manner
- async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the async function to chat with the model in a distributed manner
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Any
This is the blocking function to stream chat with the model where each token is iteratively generated
- async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Any
This is the async function to stream chat with the model where each token is iteratively generated
tuneapi.apis.model_gemini module
Connect to the Google Gemini API to their LLMs. See more Gemini.
- class tuneapi.apis.model_gemini.Gemini(id: str | None = 'gemini-2.0-flash-exp', base_url: str = 'https://generativelanguage.googleapis.com/v1beta/models/{id}:{rpc}', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None)
Bases:
ModelInterface- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float = 1, token: str | None = None, timeout=None, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the blocking function to block chat with the model
- async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, timeout=None, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the async function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the blocking function to chat with the model in a distributed manner
- async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the async function to chat with the model in a distributed manner
- embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[int, int] = (5, 60), raw: bool = False) EmbeddingGen
This is the blocking function to get embeddings for the chat
- async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[float, float] = (5.0, 60.0), raw: bool = False) EmbeddingGen
This is the async function to get embeddings for the chat
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float = 1, token: str | None = None, debug: bool = False, extra_headers: Dict[str, str] | None = None, raw: bool = False, timeout=(5, 60), **kwargs)
This is the blocking function to stream chat with the model where each token is iteratively generated
- async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float = 1, token: str | None = None, timeout=(5, 60), raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the async function to stream chat with the model where each token is iteratively generated
- tuneapi.apis.model_gemini.get_structured_schema(model: type[BaseModel]) Dict[str, Any]
Converts a Pydantic BaseModel to a JSON schema compatible with Gemini API, including anyOf for optional or union types and handling nested structures correctly.
- Parameters:
model – The Pydantic BaseModel class to convert.
- Returns:
A dictionary representing the JSON schema.
tuneapi.apis.model_openai module
Connect to the OpenAI API and use their LLMs.
- class tuneapi.apis.model_openai.Groq(id: str = 'llama3-70b-8192', base_url: str = 'https://api.groq.com/openai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None)
Bases:
OpenAIProtocolA class to interact with Groq’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.
- id
Identifier for the Mistral model.
- Type:
str
- base_url
The base URL for the Mistral API. Defaults to “https://api.groq.com/openai/v1/chat/completions”.
- Type:
str
- extra_headers
Additional headers to include in API requests.
- Type:
Optional[Dict[str, str]]
- api_token
API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.
- Type:
Optional[str]
Note
For more information, visit the Mistral API documentation at https://console.groq.com/
- embedding(**k)
If you pass a list then returned items are in the insertion order
- class tuneapi.apis.model_openai.Mistral(id: str = 'mistral-small-latest', base_url: str = 'https://api.mistral.ai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None)
Bases:
OpenAIProtocolA class to interact with Mistral’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.
- id
Identifier for the Mistral model.
- Type:
str
- base_url
The base URL for the Mistral API. Defaults to “https://api.mistral.ai/v1/chat/completions”.
- Type:
str
- extra_headers
Additional headers to include in API requests.
- Type:
Optional[Dict[str, str]]
- api_token
API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.
- Type:
Optional[str]
- embedding(*a, **k)
Raises NotImplementedError as Mistral does not support embeddings.
Note
For more information, visit the Mistral API documentation at https://console.mistral.ai/
- embedding(**k)
If you pass a list then returned items are in the insertion order
- class tuneapi.apis.model_openai.OpenAIProtocol(id: str, base_url: str, extra_headers: Dict[str, str] | None, api_token: str | None, emebdding_url: str | None, image_gen_url: str | None, audio_transcribe_url: str | None, audio_gen_url: str | None)
Bases:
ModelInterface- chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the blocking function to block chat with the model
- async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the async function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the blocking function to chat with the model in a distributed manner
- async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the async function to chat with the model in a distributed manner
- embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, raw: bool = False, extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60)) EmbeddingGen
If you pass a list then returned items are in the insertion order
- async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, timeout: Tuple[int, int] = (10, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None) EmbeddingGen
If you pass a list then returned items are in the insertion order
- image_gen(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) ImageGen
This is the blocking function to generate images
- async image_gen_async(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) <module 'PIL.Image' from '/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/PIL/Image.py'>
This is the async function to generate images
- set_api_token(token: str) None
This are used to set the API token for the model
- speech_to_text(prompt: str, audio: str, model='whisper-1', timestamp_granularities=['segment'], **kwargs) Transcript
Translates audio using the OpenAI API. Unfortunately, I couldn’t figure out how to get this working with the python requests library, so I’m using the openai library instead. For both of our sake let’s hope
openaiis stable long enough.- Parameters:
prompt (str) – The instruction prompt to guide the translation.
audio (str) – The path to the audio file to translate.
model (str) – The model to use for translation.
response_format (str) – The format of the response. Possible values are “json”, “text”, “srt”, “verbose_json”, or “vtt”. Defaults to “json”.
timestamp_granularities (List[str]) – The timestamp granularities to include in the response. Defaults to [“segment”].
- Returns:
The translated text as a string, or None if an error occurs.
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, **kwargs)
This is the blocking function to stream chat with the model where each token is iteratively generated
- async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, **kwargs)
This is the async function to stream chat with the model where each token is iteratively generated
- text_to_speech(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) bytes
- async text_to_speech_async(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) bytes
- class tuneapi.apis.model_openai.Openai(id: str = 'gpt-4o', base_url: str = 'https://api.openai.com/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None, image_gen_url: str | None = None, audio_transcribe: str | None = None, audio_gen_url: str | None = None)
Bases:
OpenAIProtocol
- class tuneapi.apis.model_openai.TuneModel(id: str = 'meta/llama-3.1-8b-instruct', base_url: str = 'https://proxy.tune.app/chat/completions', org_id: str | None = None, extra_headers: Dict[str, str] | None = None, api_token: str | None = None)
Bases:
OpenAIProtocolA class to interact with Groq’s Large Language Models (LLMs) via their API.
- id
Identifier for the Mistral model.
- Type:
str
- base_url
The base URL for the Mistral API. Defaults to “https://proxy.tune.app/chat/completions”.
- Type:
str
- org_id
Organization ID for the Tune API.
- Type:
Optional[str]
- extra_headers
Additional headers to include in API requests.
- Type:
Optional[Dict[str, str]]
- api_token
API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.
- Type:
Optional[str]
Note
For more information, visit the Mistral API documentation at https://tune.app/
tuneapi.apis.turbo module
- tuneapi.apis.turbo.distributed_chat(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, **kwargs) List | Tuple[List, Usage]
Distributes multiple chat prompts across a thread pool for parallel processing.
This function creates a pool of worker threads to process multiple chat prompts concurrently. It handles retry logic for failed requests and maintains the order of responses corresponding to the input prompts.
- Args:
- model (ModelInterface): The base model instance to clone for each worker thread. Each thread gets its own model
instance to ensure thread safety.
- prompts (List[Thread]): A list of chat prompts to process. The order of responses will match the order of these
prompts.
- post_logic (Optional[callable], default=None): A function to process each chat response before storing. If None,
raw responses are stored. Function signature should be: f(chat_response) -> processed_response
- max_threads (int, default=10): Maximum number of concurrent worker threads. Adjust based on API rate limits and
system capabilities.
retry (int, default=3): Number of retry attempts for failed requests. Set to 0 to disable retries.
pbar (bool, default=True): Whether to display a progress bar.
- Returns:
- List[Any]: A list of responses or errors, maintaining the same order as input prompts.
Successful responses will be either raw or processed (if post_logic provided). Failed requests (after retries) will contain the last error encountered.
- Raises:
ValueError: If max_threads < 1 or retry < 0 TypeError: If model is not an instance of ModelInterface
- Example:
>>> model = ChatModel(api_token="...") >>> prompts = [ ... Thread([Message("What is 2+2?")]), ... Thread([Message("What is Python?")]) ... ] >>> responses = distributed_chat(model, prompts, max_threads=5) >>> for prompt, response in zip(prompts, responses): ... print(f"Q: {prompt}
A: {response} “)
- Note:
Each worker thread gets its own model instance to prevent sharing state
Progress bar shows both initial processing and retries
The function maintains thread safety through message passing channels
- async tuneapi.apis.turbo.distributed_chat_async(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, **kwargs)