flynn.gg

Christopher Flynn

Machine Learning
Systems Architect,
PhD Mathematician

Home
Projects
Open Source
Blog
Résumé

GitHub
LinkedIn

Blog


Databricks’ hidden API client

2018-10-10 Feed

Databricks provides a cli tool via a python library that allows you to administer most of the core functionality for a Databricks implementation. Within the CLI library is an API client, and multiple Service objects that provide methods that map to each of that each service’s API endpoints.

These interfaces aren’t mentioned at all in the README documentation, but one can create an API client for each service simply from the cli library

from databricks_cli.sdk import ApiClient
from databricks_cli.sdk import service


host = "mycompany.cloud.databricks.com"
token = "mytoken"

client = ApiClient(host=host, token=token)
jobs_client = service.JobsService(client)
cluster_client = service.ClusterService(client)
managed_library = service.ManagedLibraryService(client)
# ... etc for dbfs, workspace, secret, groups

clusters = cluster_client.list_clusters()

in which each API service is instantiated using the ApiClient instance.

In the databricks-api package, this entire set of services is exposed and simplified into a single, autogenerated API client that wraps the databricks-cli tool:

from databricks_api import DatabricksAPI


host = "mycompany.cloud.databricks.com"
token = "mytoken"

databricks = DatabricksAPI(host=host, token=token)

clusters = databricks.cluster.list_clusters()

The DatabricksAPI instance provides attributes for each service described in the documentation. Each attribute object exposes the CLI services’ underlying methods which correspond to the available API 2.0 endpoints, i.e.

For example, the managed_library methods corresponding to the Libraries API:

DatabricksAPI.managed_library.all_cluster_statuses()

DatabricksAPI.managed_library.cluster_status(cluster_id)

DatabricksAPI.managed_library.install_libraries(
    cluster_id,
    libraries=None
)

DatabricksAPI.managed_library.uninstall_libraries(
    cluster_id,
    libraries=None
)

For more details view the documentation here, which is autogenerated from the databricks-cli package

Further reading

Databricks

Python Package Index

Back to the posts.