tuneapi.apis package
Submodules
tuneapi.apis.model_anthropic module
Connect to the Anthropic API to use Claude series of LLMs
- class tuneapi.apis.model_anthropic.Anthropic(id: str | None = 'claude-3-haiku-20240307', base_url: str = 'https://api.anthropic.com/v1/messages', extra_headers: Dict[str, str] | None = None)
Bases:
ModelInterface
- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float | None = None, token: str | None = None, return_message: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the main function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float | None = None, token: str | None = None, timeout=(5, 30), raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the main function to stream chat with the model where each token is iteratively generated
- tuneapi.apis.model_anthropic.get_section(tag: str, out: str) str | None
tuneapi.apis.model_gemini module
Connect to the Google Gemini API to their LLMs. See more Gemini.
- class tuneapi.apis.model_gemini.Gemini(id: str | None = 'gemini-1.5-flash', base_url: str = 'https://generativelanguage.googleapis.com/v1beta/models/{id}:{rpc}', extra_headers: Dict[str, str] | None = None)
Bases:
ModelInterface
- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=None, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the main function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 60), raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the main function to stream chat with the model where each token is iteratively generated
tuneapi.apis.model_groq module
Connect to the Groq API to experience the speed of their LPUs.
- class tuneapi.apis.model_groq.Groq(id: str | None = 'llama3-70b-8192', base_url: str = 'https://api.groq.com/openai/v1/chat/completions', extra_headers: Dict[str, str] | None = None)
Bases:
ModelInterface
- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 30), extra_headers: Dict[str, str] | None = None, **kwargs) str | Dict[str, Any]
This is the main function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 60), debug: bool = False, extra_headers: Dict[str, str] | None = None, raw: bool = False)
This is the main function to stream chat with the model where each token is iteratively generated
tuneapi.apis.model_mistral module
Connect to the Mistral API and use their LLMs.
- class tuneapi.apis.model_mistral.Mistral(id: str | None = 'mistral-small-latest', base_url: str = 'https://api.mistral.ai/v1/chat/completions', extra_headers: Dict[str, str] | None = None)
Bases:
ModelInterface
- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 30), extra_headers: Dict[str, str] | None = None, **kwargs) str | Dict[str, Any]
This is the main function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 1, token: str | None = None, timeout=(5, 60), raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None)
This is the main function to stream chat with the model where each token is iteratively generated
tuneapi.apis.model_openai module
Connect to the OpenAI API and use their LLMs.
- class tuneapi.apis.model_openai.Openai(id: str | None = 'gpt-4o', base_url: str = 'https://api.openai.com/v1/chat/completions', extra_headers: Dict[str, str] | None = None)
Bases:
ModelInterface
- chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, extra_headers: Dict[str, str] | None = None, **kwargs) Any
This is the main function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
- embedding(chats: Thread | List[str] | str, cum: bool = False, model: str = 'text-embedding-3-small', token: str | None = None, timeout=(5, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None)
If you pass a list then returned items are in the insertion order
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, timeout=(5, 60), extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False)
This is the main function to stream chat with the model where each token is iteratively generated
tuneapi.apis.model_tune module
Connect to the TuneAI Proxy API and use our standard endpoint for AI.
- class tuneapi.apis.model_tune.TuneModel(id: str | None = None, base_url: str = 'https://proxy.tune.app/chat/completions', api_token: str | None = None, org_id: str | None = None, extra_headers: Dict[str, str] | None = None)
Bases:
ModelInterface
Defines the model used in tune.app. See [Tune Studio](https://studio.tune.app/) for more information.
- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 0.7, token: str | None = None, timeout=(5, 60), stop: List[str] | None = None, extra_headers: Dict[str, str] | None = None, **kwargs) str | Dict[str, Any]
This is the main function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
- set_api_token(token: str) None
This are used to set the API token for the model
- set_org_id(org_id: str) None
This are used to set the Organisation ID for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 1024, temperature: float = 0.7, token: str | None = None, timeout=(5, 60), stop: List[str] | None = None, raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None)
This is the main function to stream chat with the model where each token is iteratively generated
tuneapi.apis.turbo module
- tuneapi.apis.turbo.distributed_chat(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True)
Distributes multiple chat prompts across a thread pool for parallel processing.
This function creates a pool of worker threads to process multiple chat prompts concurrently. It handles retry logic for failed requests and maintains the order of responses corresponding to the input prompts.
- Args:
- model (ModelInterface): The base model instance to clone for each worker thread. Each thread gets its own model
instance to ensure thread safety.
- prompts (List[Thread]): A list of chat prompts to process. The order of responses will match the order of these
prompts.
- post_logic (Optional[callable], default=None): A function to process each chat response before storing. If None,
raw responses are stored. Function signature should be: f(chat_response) -> processed_response
- max_threads (int, default=10): Maximum number of concurrent worker threads. Adjust based on API rate limits and
system capabilities.
retry (int, default=3): Number of retry attempts for failed requests. Set to 0 to disable retries.
pbar (bool, default=True): Whether to display a progress bar.
- Returns:
- List[Any]: A list of responses or errors, maintaining the same order as input prompts.
Successful responses will be either raw or processed (if post_logic provided). Failed requests (after retries) will contain the last error encountered.
- Raises:
ValueError: If max_threads < 1 or retry < 0 TypeError: If model is not an instance of ModelInterface
- Example:
>>> model = ChatModel(api_token="...") >>> prompts = [ ... Thread([Message("What is 2+2?")]), ... Thread([Message("What is Python?")]) ... ] >>> responses = distributed_chat(model, prompts, max_threads=5) >>> for prompt, response in zip(prompts, responses): ... print(f"Q: {prompt}
A: {response} “)
- Note:
Each worker thread gets its own model instance to prevent sharing state
Progress bar shows both initial processing and retries
The function maintains thread safety through message passing channels