tuneapi.apis package
Submodules
tuneapi.apis.model_anthropic module
Connect to the Anthropic API to use Claude series of LLMs
- class tuneapi.apis.model_anthropic.Anthropic(id: str | None = 'claude-3-haiku-20240307', base_url: str = 'https://api.anthropic.com/v1/messages', api_token: str | None = None, extra_headers: Dict[str, str] | None = None)
Bases:
ModelInterface
- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the blocking function to block chat with the model
- async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the async function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the blocking function to chat with the model in a distributed manner
- async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the async function to chat with the model in a distributed manner
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Any
This is the blocking function to stream chat with the model where each token is iteratively generated
- async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Any
This is the async function to stream chat with the model where each token is iteratively generated
tuneapi.apis.model_gemini module
Connect to the Google Gemini API to their LLMs. See more Gemini.
- class tuneapi.apis.model_gemini.Gemini(id: str | None = 'gemini-2.0-flash-exp', base_url: str = 'https://generativelanguage.googleapis.com/v1beta/models/{id}:{rpc}', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None)
Bases:
ModelInterface
- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float = 1, token: str | None = None, timeout=None, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the blocking function to block chat with the model
- async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, timeout=None, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the async function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the blocking function to chat with the model in a distributed manner
- async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the async function to chat with the model in a distributed manner
- embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[int, int] = (5, 60), raw: bool = False) EmbeddingGen
This is the blocking function to get embeddings for the chat
- async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[float, float] = (5.0, 60.0), raw: bool = False) EmbeddingGen
This is the async function to get embeddings for the chat
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float = 1, token: str | None = None, debug: bool = False, extra_headers: Dict[str, str] | None = None, raw: bool = False, timeout=(5, 60), **kwargs)
This is the blocking function to stream chat with the model where each token is iteratively generated
- async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float = 1, token: str | None = None, timeout=(5, 60), raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the async function to stream chat with the model where each token is iteratively generated
- tuneapi.apis.model_gemini.get_structured_schema(model: type[BaseModel]) Dict[str, Any]
Converts a Pydantic BaseModel to a JSON schema compatible with Gemini API, including anyOf for optional or union types and handling nested structures correctly.
- Parameters:
model – The Pydantic BaseModel class to convert.
- Returns:
A dictionary representing the JSON schema.
tuneapi.apis.model_openai module
Connect to the OpenAI API and use their LLMs.
- class tuneapi.apis.model_openai.Groq(id: str = 'llama3-70b-8192', base_url: str = 'https://api.groq.com/openai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None)
Bases:
OpenAIProtocol
A class to interact with Groq’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.
- id
Identifier for the Mistral model.
- Type:
str
- base_url
The base URL for the Mistral API. Defaults to “https://api.groq.com/openai/v1/chat/completions”.
- Type:
str
- extra_headers
Additional headers to include in API requests.
- Type:
Optional[Dict[str, str]]
- api_token
API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.
- Type:
Optional[str]
Note
For more information, visit the Mistral API documentation at https://console.groq.com/
- embedding(**k)
If you pass a list then returned items are in the insertion order
- class tuneapi.apis.model_openai.Mistral(id: str = 'mistral-small-latest', base_url: str = 'https://api.mistral.ai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None)
Bases:
OpenAIProtocol
A class to interact with Mistral’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.
- id
Identifier for the Mistral model.
- Type:
str
- base_url
The base URL for the Mistral API. Defaults to “https://api.mistral.ai/v1/chat/completions”.
- Type:
str
- extra_headers
Additional headers to include in API requests.
- Type:
Optional[Dict[str, str]]
- api_token
API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.
- Type:
Optional[str]
- embedding(*a, **k)
Raises NotImplementedError as Mistral does not support embeddings.
Note
For more information, visit the Mistral API documentation at https://console.mistral.ai/
- embedding(**k)
If you pass a list then returned items are in the insertion order
- class tuneapi.apis.model_openai.OpenAIProtocol(id: str, base_url: str, extra_headers: Dict[str, str] | None, api_token: str | None, emebdding_url: str | None, image_gen_url: str | None, audio_transcribe_url: str | None, audio_gen_url: str | None)
Bases:
ModelInterface
- chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the blocking function to block chat with the model
- async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the async function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the blocking function to chat with the model in a distributed manner
- async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, **kwargs)
This is the async function to chat with the model in a distributed manner
- embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, raw: bool = False, extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60)) EmbeddingGen
If you pass a list then returned items are in the insertion order
- async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, timeout: Tuple[int, int] = (10, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None) EmbeddingGen
If you pass a list then returned items are in the insertion order
- image_gen(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) ImageGen
This is the blocking function to generate images
- async image_gen_async(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) <module 'PIL.Image' from '/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/PIL/Image.py'>
This is the async function to generate images
- set_api_token(token: str) None
This are used to set the API token for the model
- speech_to_text(prompt: str, audio: str, model='whisper-1', timestamp_granularities=['segment'], **kwargs) Transcript
Translates audio using the OpenAI API. Unfortunately, I couldn’t figure out how to get this working with the python requests library, so I’m using the openai library instead. For both of our sake let’s hope
openai
is stable long enough.- Parameters:
prompt (str) – The instruction prompt to guide the translation.
audio (str) – The path to the audio file to translate.
model (str) – The model to use for translation.
response_format (str) – The format of the response. Possible values are “json”, “text”, “srt”, “verbose_json”, or “vtt”. Defaults to “json”.
timestamp_granularities (List[str]) – The timestamp granularities to include in the response. Defaults to [“segment”].
- Returns:
The translated text as a string, or None if an error occurs.
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, **kwargs)
This is the blocking function to stream chat with the model where each token is iteratively generated
- async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, **kwargs)
This is the async function to stream chat with the model where each token is iteratively generated
- text_to_speech(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) bytes
- async text_to_speech_async(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) bytes
- class tuneapi.apis.model_openai.Openai(id: str = 'gpt-4o', base_url: str = 'https://api.openai.com/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None, image_gen_url: str | None = None, audio_transcribe: str | None = None, audio_gen_url: str | None = None)
Bases:
OpenAIProtocol
- class tuneapi.apis.model_openai.TuneModel(id: str = 'meta/llama-3.1-8b-instruct', base_url: str = 'https://proxy.tune.app/chat/completions', org_id: str | None = None, extra_headers: Dict[str, str] | None = None, api_token: str | None = None)
Bases:
OpenAIProtocol
A class to interact with Groq’s Large Language Models (LLMs) via their API.
- id
Identifier for the Mistral model.
- Type:
str
- base_url
The base URL for the Mistral API. Defaults to “https://proxy.tune.app/chat/completions”.
- Type:
str
- org_id
Organization ID for the Tune API.
- Type:
Optional[str]
- extra_headers
Additional headers to include in API requests.
- Type:
Optional[Dict[str, str]]
- api_token
API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.
- Type:
Optional[str]
Note
For more information, visit the Mistral API documentation at https://tune.app/
tuneapi.apis.turbo module
- tuneapi.apis.turbo.distributed_chat(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, **kwargs) List | Tuple[List, Usage]
Distributes multiple chat prompts across a thread pool for parallel processing.
This function creates a pool of worker threads to process multiple chat prompts concurrently. It handles retry logic for failed requests and maintains the order of responses corresponding to the input prompts.
- Args:
- model (ModelInterface): The base model instance to clone for each worker thread. Each thread gets its own model
instance to ensure thread safety.
- prompts (List[Thread]): A list of chat prompts to process. The order of responses will match the order of these
prompts.
- post_logic (Optional[callable], default=None): A function to process each chat response before storing. If None,
raw responses are stored. Function signature should be: f(chat_response) -> processed_response
- max_threads (int, default=10): Maximum number of concurrent worker threads. Adjust based on API rate limits and
system capabilities.
retry (int, default=3): Number of retry attempts for failed requests. Set to 0 to disable retries.
pbar (bool, default=True): Whether to display a progress bar.
- Returns:
- List[Any]: A list of responses or errors, maintaining the same order as input prompts.
Successful responses will be either raw or processed (if post_logic provided). Failed requests (after retries) will contain the last error encountered.
- Raises:
ValueError: If max_threads < 1 or retry < 0 TypeError: If model is not an instance of ModelInterface
- Example:
>>> model = ChatModel(api_token="...") >>> prompts = [ ... Thread([Message("What is 2+2?")]), ... Thread([Message("What is Python?")]) ... ] >>> responses = distributed_chat(model, prompts, max_threads=5) >>> for prompt, response in zip(prompts, responses): ... print(f"Q: {prompt}
A: {response} “)
- Note:
Each worker thread gets its own model instance to prevent sharing state
Progress bar shows both initial processing and retries
The function maintains thread safety through message passing channels
- async tuneapi.apis.turbo.distributed_chat_async(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, **kwargs)