VertexAI [Gemini]
Overview​
| Property | Details |
|---|---|
| Description | Vertex AI is a fully-managed AI development platform for building and using generative AI. |
| Provider Route on LiteLLM | vertex_ai/ |
| Link to Provider Doc | Vertex AI ↗ |
| Base URL | 1. Regional endpointshttps://{vertex_location}-aiplatform.googleapis.com/2. Global endpoints (limited availability) https://aiplatform.googleapis.com/ |
| Supported Operations | /chat/completions, /completions, /embeddings, /audio/speech, /fine_tuning, /batches, /files, /images, /rerank |
| Model Format | Provider | Auth Required |
|---|---|---|
vertex_ai/gemini-2.0-flash | Vertex AI | GCP credentials + project |
gemini-2.0-flash (no prefix) | Vertex AI | GCP credentials + project |
gemini/gemini-2.0-flash | Gemini API | GEMINI_API_KEY (simple API key) |
If you just want to use an API key (like OpenAI), use the gemini/ prefix instead. See Gemini - Google AI Studio.
Models without a prefix default to Vertex AI which requires GCP authentication.
vertex_ai/ route​
The vertex_ai/ route uses uses VertexAI's REST API.
from litellm import completion
import json
## GET CREDENTIALS
## RUN ##
# !gcloud auth application-default login - run this to add vertex credentials to your env
## OR ##
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
## COMPLETION CALL
response = completion(
model="vertex_ai/gemini-2.5-pro",
messages=[{ "content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json
)
System Message​
from litellm import completion
import json
## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
response = completion(
model="vertex_ai/gemini-2.5-pro",
messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json
)
Function Calling​
Force Gemini to make tool calls with tool_choice="required".
from litellm import completion
import json
## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
messages = [
{
"role": "system",
"content": "Your name is Litellm Bot, you are a helpful assistant",
},
# User asks for their name and weather in San Francisco
{
"role": "user",
"content": "Hello, what is your name and can you tell me the weather?",
},
]
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
},
},
}
]
data = {
"model": "vertex_ai/gemini-1.5-pro-preview-0514"),
"messages": messages,
"tools": tools,
"tool_choice": "required",
"vertex_credentials": vertex_credentials_json
}
## COMPLETION CALL
print(completion(**data))
JSON Schema​
From v1.40.1+ LiteLLM supports sending response_schema as a param for Gemini-1.5-Pro on Vertex AI. For other models (e.g. gemini-1.5-flash or claude-3-5-sonnet), LiteLLM adds the schema to the message list with a user-controlled prompt.
Response Schema
- SDK
- PROXY
from litellm import completion
import json
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
messages = [
{
"role": "user",
"content": "List 5 popular cookie recipes."
}
]
response_schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
}
completion(
model="vertex_ai/gemini-1.5-pro",
messages=messages,
response_format={"type": "json_object", "response_schema": response_schema} # 👈 KEY CHANGE
)
print(json.loads(completion.choices[0].message.content))
- Add model to config.yaml
model_list:
- model_name: gemini-2.5-pro
litellm_params:
model: vertex_ai/gemini-2.5-pro
vertex_project: "project-id"
vertex_location: "us-central1"
vertex_credentials: "/path/to/service_account.json" # [OPTIONAL] Do this OR `!gcloud auth application-default login` - run this to add vertex credentials to your env
or
model_list:
- model_name: gemini-pro
litellm_params:
model: vertex_ai/gemini-1.5-pro
litellm_credential_name: vertex-global
vertex_project: project-name-here
vertex_location: global
base_model: gemini
model_info:
provider: Vertex
- Start Proxy
$ litellm --config /path/to/config.yaml
- Make Request!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
"model": "gemini-2.5-pro",
"messages": [
{"role": "user", "content": "List 5 popular cookie recipes."}
],
"response_format": {"type": "json_object", "response_schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
}}
}
'
Validate Schema
To validate the response_schema, set enforce_validation: true.
- SDK
- PROXY
from litellm import completion, JSONSchemaValidationError
try:
completion(
model="vertex_ai/gemini-1.5-pro",
messages=messages,
response_format={
"type": "json_object",
"response_schema": response_schema,
"enforce_validation": true # 👈 KEY CHANGE
}
)
except JSONSchemaValidationError as e:
print("Raw Response: {}".format(e.raw_response))
raise e
- Add model to config.yaml
model_list:
- model_name: gemini-2.5-pro
litellm_params:
model: vertex_ai/gemini-2.5-pro
vertex_project: "project-id"
vertex_location: "us-central1"
vertex_credentials: "/path/to/service_account.json" # [OPTIONAL] Do this OR `!gcloud auth application-default login` - run this to add vertex credentials to your env
- Start Proxy
$ litellm --config /path/to/config.yaml
- Make Request!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
"model": "gemini-2.5-pro",
"messages": [
{"role": "user", "content": "List 5 popular cookie recipes."}
],
"response_format": {"type": "json_object", "response_schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
},
"enforce_validation": true
}
}
'
LiteLLM will validate the response against the schema, and raise a JSONSchemaValidationError if the response does not match the schema.
JSONSchemaValidationError inherits from openai.APIError
Access the raw response with e.raw_response
Add to prompt yourself
from litellm import completion
## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
messages = [
{
"role": "user",
"content": """
List 5 popular cookie recipes.
Using this JSON schema:
Recipe = {"recipe_name": str}
Return a `list[Recipe]`
"""
}
]
completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })
Google Hosted Tools (Web Search, Code Execution, etc.)​
Web Search​
Add Google Search Result grounding to vertex ai calls.
See the grounding metadata with response_obj._hidden_params["vertex_ai_grounding_metadata"]
- SDK
- PROXY
from litellm import completion
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH
resp = litellm.completion(
model="vertex_ai/gemini-1.0-pro-001",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=tools,
)
print(resp)
- OpenAI Python SDK
- cURL
from openai import OpenAI
client = OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=[{"googleSearch": {}}],
)
print(response)
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Who won the world cup?"}
],
"tools": [
{
"googleSearch": {}
}
]
}'
Url Context​
Using the URL context tool, you can provide Gemini with URLs as additional context for your prompt. The model can then retrieve content from the URLs and use that content to inform and shape its response.
See the grounding metadata with response_obj._hidden_params["vertex_ai_url_context_metadata"]
- SDK
- PROXY
from litellm import completion
import os
os.environ["GEMINI_API_KEY"] = ".."
# 👇 ADD URL CONTEXT
tools = [{"urlContext": {}}]
response = completion(
model="gemini/gemini-2.0-flash",
messages=[{"role": "user", "content": "Summarize this document: https://ai.google.dev/gemini-api/docs/models"}],
tools=tools,
)
print(response)
# Access URL context metadata
url_context_metadata = response.model_extra['vertex_ai_url_context_metadata']
urlMetadata = url_context_metadata[0]['urlMetadata'][0]
print(f"Retrieved URL: {urlMetadata['retrievedUrl']}")
print(f"Retrieval Status: {urlMetadata['urlRetrievalStatus']}")
- Setup config.yaml
model_list:
- model_name: gemini-2.0-flash
litellm_params:
model: gemini/gemini-2.0-flash
api_key: os.environ/GEMINI_API_KEY
- Start Proxy
$ litellm --config /path/to/config.yaml
- Make Request!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "gemini-2.0-flash",
"messages": [{"role": "user", "content": "Summarize this document: https://ai.google.dev/gemini-api/docs/models"}],
"tools": [{"urlContext": {}}]
}'
Enterprise Web Search​
You can also use the enterpriseWebSearch tool for an enterprise compliant search.
- SDK
- PROXY
from litellm import completion
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH
resp = litellm.completion(
model="vertex_ai/gemini-1.0-pro-001",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=tools,
)
print(resp)
- OpenAI Python SDK
- cURL
from openai import OpenAI
client = OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=[{"enterpriseWebSearch": {}}],
)
print(response)
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Who won the world cup?"}
],
"tools": [
{
"enterpriseWebSearch": {}
}
]
}'
Code Execution​
- SDK
- PROXY
from litellm import completion
import os
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{"codeExecution": {}}] # 👈 ADD CODE EXECUTION
response = completion(
model="vertex_ai/gemini-2.0-flash",
messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
tools=tools,
)
print(response)
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gemini-2.0-flash",
"messages": [{"role": "user", "content": "What is the weather in San Francisco?"}],
"tools": [{"codeExecution": {}}]
}
'
Google Maps​
Use Google Maps to provide location-based context to your Gemini models.
- SDK
- PROXY
Basic Usage - Enable Widget Only
from litellm import completion
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{"googleMaps": {"enableWidget": "ENABLE_WIDGET"}}] # 👈 ADD GOOGLE MAPS
resp = litellm.completion(
model="vertex_ai/gemini-2.0-flash",
messages=[{"role": "user", "content": "What restaurants are nearby?"}],
tools=tools,
)
print(resp)
With Location Data
You can specify a location to ground the model's responses with location-specific information:
from litellm import completion
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{
"googleMaps": {
"enableWidget": "ENABLE_WIDGET",
"latitude": 37.7749, # San Francisco latitude
"longitude": -122.4194, # San Francisco longitude
"languageCode": "en_US" # Optional: language for results
}
}] # 👈 ADD GOOGLE MAPS WITH LOCATION
resp = litellm.completion(
model="vertex_ai/gemini-2.0-flash",
messages=[{"role": "user", "content": "What restaurants are nearby?"}],
tools=tools,
)
print(resp)
- OpenAI Python SDK
- cURL
Basic Usage - Enable Widget Only
from openai import OpenAI
client = OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "What restaurants are nearby?"}],
tools=[{"googleMaps": {"enableWidget": "ENABLE_WIDGET"}}],
)
print(response)
With Location Data
from openai import OpenAI
client = OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "What restaurants are nearby?"}],
tools=[{
"googleMaps": {
"enableWidget": "ENABLE_WIDGET",
"latitude": 37.7749, # San Francisco latitude
"longitude": -122.4194, # San Francisco longitude
"languageCode": "en_US" # Optional: language for results
}
}],
)
print(response)
Basic Usage - Enable Widget Only
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-2.0-flash",
"messages": [
{"role": "user", "content": "What restaurants are nearby?"}
],
"tools": [
{
"googleMaps": {"enableWidget": "ENABLE_WIDGET"}
}
]
}'
With Location Data
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-2.0-flash",
"messages": [
{"role": "user", "content": "What restaurants are nearby?"}
],
"tools": [
{
"googleMaps": {
"enableWidget": "ENABLE_WIDGET",
"latitude": 37.7749,
"longitude": -122.4194,
"languageCode": "en_US"
}
}
]
}'
Moving from Vertex AI SDK to LiteLLM (GROUNDING)​
If this was your initial VertexAI Grounding code,
import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig, Tool, grounding
vertexai.init(project=project_id, location="us-central1")
model = GenerativeModel("gemini-1.5-flash-001")
# Use Google Search for grounding
tool = Tool.from_google_search_retrieval(grounding.GoogleSearchRetrieval())
prompt = "When is the next total solar eclipse in US?"
response = model.generate_content(
prompt,
tools=[tool],
generation_config=GenerationConfig(
temperature=0.0,
),
)
print(response)
then, this is what it looks like now
from litellm import completion
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{"googleSearch": {"disable_attributon": False}}] # 👈 ADD GOOGLE SEARCH
resp = litellm.completion(
model="vertex_ai/gemini-1.0-pro-001",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=tools,
vertex_project="project-id"
)
print(resp)
Thinking / reasoning_content​
LiteLLM translates OpenAI's reasoning_effort to Gemini's thinking parameter. Code
Added an additional non-OpenAI standard "disable" value for non-reasoning Gemini requests.
Mapping
| reasoning_effort | thinking |
|---|---|
| "disable" | "budget_tokens": 0 |
| "low" | "budget_tokens": 1024 |
| "medium" | "budget_tokens": 2048 |
| "high" | "budget_tokens": 4096 |
- SDK
- PROXY
from litellm import completion
# !gcloud auth application-default login - run this to add vertex credentials to your env
resp = completion(
model="vertex_ai/gemini-2.5-flash-preview-04-17",
messages=[{"role": "user", "content": "What is the capital of France?"}],
reasoning_effort="low",
vertex_project="project-id",
vertex_location="us-central1"
)
- Setup config.yaml
- model_name: gemini-2.5-flash
litellm_params:
model: vertex_ai/gemini-2.5-flash-preview-04-17
vertex_credentials: {"project_id": "project-id", "location": "us-central1", "project_key": "project-key"}
vertex_project: "project-id"
vertex_location: "us-central1"
- Start proxy
litellm --config /path/to/config.yaml
- Test it!
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"reasoning_effort": "low"
}'
Expected Response
ModelResponse(
id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
created=1740470510,
model='claude-3-7-sonnet-20250219',
object='chat.completion',
system_fingerprint=None,
choices=[
Choices(
finish_reason='stop',
index=0,
message=Message(
content="The capital of France is Paris.",
role='assistant',
tool_calls=None,
function_call=None,
reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
),
)
],
usage=Usage(
completion_tokens=68,
prompt_tokens=42,
total_tokens=110,
completion_tokens_details=None,
prompt_tokens_details=PromptTokensDetailsWrapper(
audio_tokens=None,
cached_tokens=0,
text_tokens=None,
image_tokens=None
),
cache_creation_input_tokens=0,
cache_read_input_tokens=0
)
)
Pass thinking to Gemini models​
You can also pass the thinking parameter to Gemini models.
This is translated to Gemini's thinkingConfig parameter.
- SDK
- PROXY
from litellm import completion
# !gcloud auth application-default login - run this to add vertex credentials to your env
response = litellm.completion(
model="vertex_ai/gemini-2.5-flash-preview-04-17",
messages=[{"role": "user", "content": "What is the capital of France?"}],
thinking={"type": "enabled", "budget_tokens": 1024},
vertex_project="project-id",
vertex_location="us-central1"
)
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "vertex_ai/gemini-2.5-flash-preview-04-17",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"thinking": {"type": "enabled", "budget_tokens": 1024}
}'
Context Caching​
Unified Endpoint​
Use Vertex AI context caching in the same way as Google AI Studio - Context Caching
Example usage​
- SDK
- SDK with Custom TTL
- PROXY
from litellm import completion
for _ in range(2):
resp = completion(
model="vertex_ai/gemini-2.5-pro",
messages=[
# System Message
{
"role": "system",
"content": [
{
"type": "text",
"text": "Here is the full text of a complex legal agreement" * 4000,
"cache_control": {"type": "ephemeral"}, # 👈 KEY CHANGE
}
],
},
# marked for caching with the cache_control parameter, so that this checkpoint can read from the previous cache.
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the key terms and conditions in this agreement?",
"cache_control": {"type": "ephemeral"},
}
],
}]
)
print(resp.usage) # 👈 2nd usage block will be less, since cached tokens used
from litellm import completion
# Cache for 2 hours (7200 seconds)
resp = completion(
model="vertex_ai/gemini-2.5-pro",
messages=[
{
"role": "system",
"content": [
{
"type": "text",
"text": "Here is the full text of a complex legal agreement" * 4000,
"cache_control": {
"type": "ephemeral",
"ttl": "7200s" # 👈 Cache for 2 hours
},
}
],
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the key terms and conditions in this agreement?",
"cache_control": {
"type": "ephemeral",
"ttl": "3600s" # 👈 This TTL will be ignored (first one is used)
},
}
],
}
]
)
print(resp.usage)
- Setup config.yaml
model_list:
- model_name: gemini-2.5-pro
litellm_params:
model: vertex_ai/gemini-2.5-pro
vertex_project: "project-id"
vertex_location: "us-central1"
vertex_credentials: "/path/to/service_account.json"
- Start proxy
litellm --config /path/to/config.yaml
- Test it!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gemini-2.5-flash",
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "Long cache message (must be >= 1024 tokens)",
"cache_control": {
"type": "ephemeral",
"ttl": "7200s"
}
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is the text about?"
}
]
}
]
}'
Calling provider api directly​
1. Create the Cache​
First, create the cache by sending a POST request to the cachedContents endpoint via the LiteLLM proxy.
- PROXY
curl http://0.0.0.0:4000/vertex_ai/v1/projects/{project_id}/locations/{location}/cachedContents \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-flash",
"displayName": "example_cache",
"contents": [{
"role": "user",
"parts": [{
"text": ".... a long book to be cached"
}]
}]
}'
2. Get the Cache Name from the Response​
Vertex AI will return a response containing the name of the cached content. This name is the identifier for your cached data.
{
"name": "projects/12341234/locations/{location}/cachedContents/123123123123123",
"model": "projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-flash",
"createTime": "2025-09-23T19:13:50.674976Z",
"updateTime": "2025-09-23T19:13:50.674976Z",
"expireTime": "2025-09-23T20:13:50.655988Z",
"displayName": "example_cache",
"usageMetadata": {
"totalTokenCount": 1246,
"textCount": 5132
}
}
3. Use the Cached Content​
Use the name from the response as cachedContent or cached_content in subsequent API calls to reuse the cached information. This is passed in the body of your request to /chat/completions.
- PROXY
curl http://0.0.0.0:4000/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"cachedContent": "projects/545201925769/locations/us-central1/cachedContents/4511135542628319232",
"model": "gemini-2.5-flash",
"messages": [
{
"role": "user",
"content": "what is the book about?"
}
]
}'
Pre-requisites​
-
pip install google-cloud-aiplatform(pre-installed on proxy docker image) -
Authentication:
- run
gcloud auth application-default loginSee Google Cloud Docs - Alternatively you can set
GOOGLE_APPLICATION_CREDENTIALS
Here's how: Jump to Code
- Create a service account on GCP
- Export the credentials as a json
- load the json and json.dump the json as a string
- store the json string in your environment as
GOOGLE_APPLICATION_CREDENTIALS
- run
Sample Usage​
import litellm
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1" # proj location
response = litellm.completion(model="gemini-2.5-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])
Usage with LiteLLM Proxy Server​
Here's how to use Vertex AI with the LiteLLM Proxy Server
- Modify the config.yaml
- Different location per model
- One location all vertex models
Use this when you need to set a different location for each vertex model
model_list:
- model_name: gemini-vision
litellm_params:
model: vertex_ai/gemini-1.0-pro-vision-001
vertex_project: "project-id"
vertex_location: "us-central1"
- model_name: gemini-vision
litellm_params:
model: vertex_ai/gemini-1.0-pro-vision-001
vertex_project: "project-id2"
vertex_location: "us-east"
Use this when you have one vertex location for all models
litellm_settings:
vertex_project: "hardy-device-38811" # Your Project ID
vertex_location: "us-central1" # proj location
model_list:
-model_name: team1-gemini-2.5-pro
litellm_params:
model: gemini-2.5-pro
- Start the proxy
$ litellm --config /path/to/config.yaml
- Send Request to LiteLLM Proxy Server
- OpenAI Python v1.0.0+
- curl
import openai
client = openai.OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
response = client.chat.completions.create(
model="team1-gemini-2.5-pro",
messages = [
{
"role": "user",
"content": "what llm are you"
}
],
)
print(response)
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "team1-gemini-2.5-pro",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
Authentication - vertex_project, vertex_location, etc.​
Set your vertex credentials via:
- dynamic params OR
- env vars