Blog

Databricks’ hidden API client

2018-10-10

Databricks provides a cli tool via a python library that allows you to administer most of the core functionality for a Databricks implementation. Within the CLI library is an API client, and multiple Service objects that provide methods that map to each of that each service’s API endpoints.

These interfaces aren’t mentioned at all in the README documentation, but one can create an API client for each service simply from the cli library

from databricks_cli.sdk import ApiClient
from databricks_cli.sdk import service


host = "mycompany.cloud.databricks.com"
token = "mytoken"

client = ApiClient(host=host, token=token)
jobs_client = service.JobsService(client)
cluster_client = service.ClusterService(client)
managed_library = service.ManagedLibraryService(client)
# ... etc for dbfs, workspace, secret, groups

clusters = cluster_client.list_clusters()

in which each API service is instantiated using the ApiClient instance.

In the databricks-api package, this entire set of services is exposed and simplified into a single, autogenerated API client that wraps the databricks-cli tool:

from databricks_api import DatabricksAPI


host = "mycompany.cloud.databricks.com"
token = "mytoken"

databricks = DatabricksAPI(host=host, token=token)

clusters = databricks.cluster.list_clusters()

The DatabricksAPI instance provides attributes for each service described in the documentation. Each attribute object exposes the CLI services’ underlying methods which correspond to the available API 2.0 endpoints, i.e.

DatabricksAPI.jobs
DatabricksAPI.cluster
DatabricksAPI.managed_library
DatabricksAPI.dbfs
DatabricksAPI.workspace
DatabricksAPI.secret
DatabricksAPI.groups

For example, the managed_library methods corresponding to the Libraries API:

DatabricksAPI.managed_library.all_cluster_statuses()

DatabricksAPI.managed_library.cluster_status(cluster_id)

DatabricksAPI.managed_library.install_libraries(
    cluster_id,
    libraries=None
)

DatabricksAPI.managed_library.uninstall_libraries(
    cluster_id,
    libraries=None
)

For more details view the documentation here, which is autogenerated from the databricks-cli package

flynn.gg

Christopher Flynn

Blog

Databricks’ hidden API client

Further reading

Databricks