tuneapi.apis package

Submodules

tuneapi.apis.model_anthropic module

Connect to the Anthropic API to use Claude series of LLMs

class tuneapi.apis.model_anthropic.Anthropic(id: str | None = 'claude-3-haiku-20240307', base_url: str = 'https://api.anthropic.com/v1/messages', extra_headers: Dict[str, str] | None = None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float | None = None, token: str | None = None, return_message: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)

This is the main function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
set_api_token(token: str) None

This are used to set the API token for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float | None = None, token: str | None = None, timeout=(5, 30), raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs) Any

This is the main function to stream chat with the model where each token is iteratively generated

tuneapi.apis.model_anthropic.get_section(tag: str, out: str) str | None

tuneapi.apis.model_gemini module

Connect to the Google Gemini API to their LLMs. See more Gemini.

class tuneapi.apis.model_gemini.Gemini(id: str | None = 'gemini-1.5-flash', base_url: str = 'https://generativelanguage.googleapis.com/v1beta/models/{id}:{rpc}', extra_headers: Dict[str, str] | None = None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=None, extra_headers: Dict[str, str] | None = None, **kwargs) Any

This is the main function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
set_api_token(token: str) None

This are used to set the API token for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 60), raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)

This is the main function to stream chat with the model where each token is iteratively generated

tuneapi.apis.model_groq module

Connect to the Groq API to experience the speed of their LPUs.

class tuneapi.apis.model_groq.Groq(id: str | None = 'llama3-70b-8192', base_url: str = 'https://api.groq.com/openai/v1/chat/completions', extra_headers: Dict[str, str] | None = None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 30), extra_headers: Dict[str, str] | None = None, **kwargs) str | Dict[str, Any]

This is the main function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
set_api_token(token: str) None

This are used to set the API token for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 60), debug: bool = False, extra_headers: Dict[str, str] | None = None, raw: bool = False)

This is the main function to stream chat with the model where each token is iteratively generated

tuneapi.apis.model_mistral module

Connect to the Mistral API and use their LLMs.

class tuneapi.apis.model_mistral.Mistral(id: str | None = 'mistral-small-latest', base_url: str = 'https://api.mistral.ai/v1/chat/completions', extra_headers: Dict[str, str] | None = None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 30), extra_headers: Dict[str, str] | None = None, **kwargs) str | Dict[str, Any]

This is the main function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
set_api_token(token: str) None

This are used to set the API token for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 60), raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None)

This is the main function to stream chat with the model where each token is iteratively generated

tuneapi.apis.model_openai module

Connect to the OpenAI API and use their LLMs.

class tuneapi.apis.model_openai.Openai(id: str | None = 'gpt-4o', base_url: str = 'https://api.openai.com/v1/chat/completions', extra_headers: Dict[str, str] | None = None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, extra_headers: Dict[str, str] | None = None, **kwargs) Any

This is the main function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
embedding(chats: Thread | List[str] | str, cum: bool = False, model: str = 'text-embedding-3-small', token: str | None = None, timeout=(5, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None)

If you pass a list then returned items are in the insertion order

set_api_token(token: str) None

This are used to set the API token for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, timeout=(5, 60), extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False)

This is the main function to stream chat with the model where each token is iteratively generated

tuneapi.apis.model_tune module

Connect to the TuneAI Proxy API and use our standard endpoint for AI.

class tuneapi.apis.model_tune.TuneModel(id: str | None = None, base_url: str = 'https://proxy.tune.app/chat/completions', api_token: str | None = None, org_id: str | None = None, extra_headers: Dict[str, str] | None = None)

Bases: ModelInterface

Defines the model used in tune.app. See [Tune Studio](https://studio.tune.app/) for more information.

chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 0.7, token: str | None = None, timeout=(5, 60), stop: List[str] | None = None, extra_headers: Dict[str, str] | None = None, **kwargs) str | Dict[str, Any]

This is the main function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
set_api_token(token: str) None

This are used to set the API token for the model

set_org_id(org_id: str) None

This are used to set the Organisation ID for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 0.7, token: str | None = None, timeout=(5, 60), stop: List[str] | None = None, raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None)

This is the main function to stream chat with the model where each token is iteratively generated

tuneapi.apis.turbo module

tuneapi.apis.turbo.distributed_chat(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)

Distributes multiple chat prompts across a thread pool for parallel processing.

This function creates a pool of worker threads to process multiple chat prompts concurrently. It handles retry logic for failed requests and maintains the order of responses corresponding to the input prompts.

Args:
model (ModelInterface): The base model instance to clone for each worker thread. Each thread gets its own model

instance to ensure thread safety.

prompts (List[Thread]): A list of chat prompts to process. The order of responses will match the order of these

prompts.

post_logic (Optional[callable], default=None): A function to process each chat response before storing. If None,

raw responses are stored. Function signature should be: f(chat_response) -> processed_response

max_threads (int, default=10): Maximum number of concurrent worker threads. Adjust based on API rate limits and

system capabilities.

retry (int, default=3): Number of retry attempts for failed requests. Set to 0 to disable retries.

pbar (bool, default=True): Whether to display a progress bar.

Returns:
List[Any]: A list of responses or errors, maintaining the same order as input prompts.

Successful responses will be either raw or processed (if post_logic provided). Failed requests (after retries) will contain the last error encountered.

Raises:

ValueError: If max_threads < 1 or retry < 0 TypeError: If model is not an instance of ModelInterface

Example:
>>> model = ChatModel(api_token="...")
>>> prompts = [
...     Thread([Message("What is 2+2?")]),
...     Thread([Message("What is Python?")])
... ]
>>> responses = distributed_chat(model, prompts, max_threads=5)
>>> for prompt, response in zip(prompts, responses):
...     print(f"Q: {prompt}

A: {response} “)

Note:
  • Each worker thread gets its own model instance to prevent sharing state

  • Progress bar shows both initial processing and retries

  • The function maintains thread safety through message passing channels

Module contents