Synchronization

class ckan_api_client.syncing.SynchronizationClient(base_url, api_key=None, **kw)[source]

Synchronization client, providing functionality for importing collections of datasets into a Ckan instance.

Synchronization acts as follows:

  • Snsure all the required organizations/groups are there; create a map between “source” ids and Ckan ids. Optionally update existing organizations/groups with new details.
  • Find all the Ckan datasets matching the source_name
  • Determine which datasets...
    • ...need to be created
    • ...need to be updated
    • ...need to be deleted
  • First, delete datasets to be deleted in order to free up names
  • Then, create datasets that need to be created
  • Lastly, update datasets using the configured merge strategy (see constructor arguments).
__init__(base_url, api_key=None, **kw)[source]
Parameters:
  • base_url – Base URL of the Ckan instance, passed to high-level client
  • api_key – API key to be used, passed to high-level client
  • organization_merge_strategy

    One of:

    • ‘create’ (default) if the organization doesn’t exist, create it. Otherwise, leave it alone.
    • ‘update’ if the organization doesn’t exist, create it. Otherwise, update with new values.
  • group_merge_strategy

    One of:

    • ‘create’ (default) if the group doesn’t exist, create it. Otherwise, leave it alone.
    • ‘update’ if the group doesn’t exist, create it. Otherwise, update with new values.
  • dataset_preserve_names – if True (the default) will preserve old names of existing datasets
  • dataset_preserve_organization – if True (the default) will preserve old organizations of existing datasets.
  • dataset_group_merge_strategy
    • ‘add’ add groups, keep old ones (default)
    • ‘replace’ replace all existing groups
    • ‘preserve’ leave groups alone
sync(source_name, data)[source]

Synchronize data from a source into Ckan.

  • datasets are matched by _harvest_source
  • groups and organizations are matched by name
Parameters:
  • source_name – String identifying the source of the data. Used to build ids that will be used in further synchronizations.
  • data – Data to be synchronized. Should be a dict (or dict-like) with top level keys coresponding to the object type, mapping to dictionaries of {'id': <object>}.