Machine Learning
Systems Architect,
PhD Mathematician
Databricks provides a cli tool via a python library that allows you to administer most of the core functionality for a Databricks implementation. Within the CLI library is an API client, and multiple Service objects that provide methods that map to each of that each service’s API endpoints.
These interfaces aren’t mentioned at all in the README documentation, but one can create an API client for each service simply from the cli library
from databricks_cli.sdk import ApiClient
from databricks_cli.sdk import service
host = "mycompany.cloud.databricks.com"
token = "mytoken"
client = ApiClient(host=host, token=token)
jobs_client = service.JobsService(client)
cluster_client = service.ClusterService(client)
managed_library = service.ManagedLibraryService(client)
# ... etc for dbfs, workspace, secret, groups
clusters = cluster_client.list_clusters()
in which each API service is instantiated using the ApiClient
instance.
In the databricks-api package, this entire set of services is exposed and simplified into a single, autogenerated API client that wraps the databricks-cli
tool:
from databricks_api import DatabricksAPI
host = "mycompany.cloud.databricks.com"
token = "mytoken"
databricks = DatabricksAPI(host=host, token=token)
clusters = databricks.cluster.list_clusters()
The DatabricksAPI
instance provides attributes for each service described in the documentation. Each attribute object exposes the CLI services’ underlying methods which correspond to the available API 2.0 endpoints, i.e.
For example, the managed_library
methods corresponding to the Libraries API:
DatabricksAPI.managed_library.all_cluster_statuses()
DatabricksAPI.managed_library.cluster_status(cluster_id)
DatabricksAPI.managed_library.install_libraries(
cluster_id,
libraries=None
)
DatabricksAPI.managed_library.uninstall_libraries(
cluster_id,
libraries=None
)
For more details view the documentation here, which is autogenerated from the databricks-cli
package