Getting Started =============== Primary Usage: edk-cli ---------------------- The recommended and primary way to run Earth Data Kit (EDK) is via edk-cli. This approach avoids dependency issues and ensures a consistent environment across platforms. Requirements ------------ * Python 3.12 or newer * Docker Quick Start ----------- 1. **Get and install edk-cli:** .. code-block:: console $ pip3 install https://github.com/earth-data-kit/edk-cli/releases/download/0.1.0/edk_cli-0.1.0-py3-none-any.whl 2. **Create your `.env` file using edk configure:** .. code-block:: console $ edk configure This will help you create a `.env` file. See the "Environment Configuration" section below for details on available options. .. note:: Use relative paths (not absolute paths) when specifying directories. - Example: `./workspace` - Avoid: `/Users/username/earth-data-kit/workspace` 3. **Initialize the EDK container:** .. code-block:: console $ edk run This will build and start the EDK Docker container with all dependencies pre-installed. Additionally, if you have a `requirements.txt` file inside the `workspace` directory, it will be installed automatically inside the container. 5. **SSH into the container:** If you want an interactive shell inside the container, run: .. code-block:: console $ edk ssh This will open a bash shell inside the EDK container, allowing you to run commands interactively. 6. **(Optional) Start a JupyterLab server inside the container:** If you want to use JupyterLab for interactive development, you can launch a JupyterLab server inside the EDK container by running: .. code-block:: console $ edk notebook This will start a JupyterLab server accessible from your browser. By default, it will be available at `http://localhost:8888` on your host machine. You can then open notebooks and interact with your code and data directly within the container environment. For more practical usage, check out the `examples` folder in the repository: https://github.com/earth-data-kit/earth-data-kit/tree/master/examples You'll find sample scripts and workflows demonstrating how to use Earth Data Kit with different data sources and scenarios. Environment Configuration ------------------------- Earth Data Kit can be customized via environment variables, which you should define in your `.env` file. This lets you easily configure settings such as AWS credentials, GDAL options, and other operational parameters. General Options ~~~~~~~~~~~~~~~ * ``DATA_DIR`` *(Required)*: The directory path used for storing and sharing data within the container (e.g., catalog, pre-processed VRTs). Is also used to create any intermediate files. * ``WORKSPACE_DIR`` *(Required)*: The directory path used for storing your scripts, notebooks, etc. * ``EDK_MAX_WORKERS``: The maximum number of workers to use for parallel processing. If not set, it will use ``num_cores - 2`` for CPU intensive tasks and ``(2 * num_cores) - 1`` for I/O intensive tasks. AWS Options ~~~~~~~~~~~ * ``AWS_CONFIG_DIR``: By default, EDK uses `~/.aws` for AWS credentials and config. Set this variable to override the location. * ``AWS_REGION``: AWS region where your data is stored (e.g., us-west-2). Use this when accessing S3. * ``AWS_NO_SIGN_REQUEST`` (YES/NO): If set to YES, this option disables request signing, meaning AWS credentials will be bypassed. * ``AWS_REQUEST_PAYER`` (requester): Indicates that the requester accepts any charges that may result from the request. Use this when accessing buckets that require payer confirmation. Google Earth Engine Options ~~~~~~~~~~~~~~~~~~~~~~~~~~~ * ``GOOGLE_APPLICATION_CREDENTIALS``: Specifies the path to the JSON credentials file for authenticating with the Earth Engine API. See the `Earth Engine service account guide `_ for more information. This configuration setup provides flexibility to adapt Earth Data Kit to your specific environment and processing needs.