Tutorial: Use sdutil to load data into Seismic Store
Seismic Store is a cloud-based solution for storing and managing datasets of any size. It provides a secure way to access datasets through a scoped authorization mechanism. Seismic Store overcomes cloud providers' object size limitations by managing generic datasets as multiple independent objects.
Sdutil is a command-line Python tool for interacting with Seismic Store. You can use sdutil to perform basic operations like uploading data to Seismic Store, downloading datasets from Seismic Store, managing users, and listing folder contents.
In this tutorial, you learn how to:
- Set up and run the sdutil tool.
- Obtain the Seismic Store URI.
- Create a subproject.
- Register a user.
- Use sdutil to manage datasets with Seismic Store.
- Run tests to validate the sdutil tool's functionalities.
Prerequisites
Install the following prerequisites based on your operating system.
Windows:
Linux:
Unix/Mac
Sdutil requires other modules noted in requirements.txt
. You can either install the modules as is or install them in a virtual environment to keep your host clean from package conflicts. If you don't want to install them in a virtual environment, skip the four virtual environment commands in the following code. Additionally, if you're using Mac instead of Ubuntu or WSL - Ubuntu 20.04, either use homebrew
instead of apt-get
as your package manager, or manually install apt-get
.
# Check if virtualenv is already installed
virtualenv --version
# If not, install it via pip or apt-get
pip install virtualenv
# or sudo apt-get install python3-venv for WSL
# Create a virtual environment for sdutil
virtualenv sdutilenv
# or python3 -m venv sdutilenv for WSL
# Activate the virtual environment
Windows: sdutilenv/Scripts/activate
Linux: source sdutilenv/bin/activate
Install required dependencies:
# Run this from the extracted sdutil folder
pip install -r requirements.txt
Usage
Configuration
Clone the sdutil repository from the community
azure-stable
branch and open in your favorite editor.Replace the contents of
config.yaml
in thesdlib
folder with the following YAML. Fill in the three templatized values (two instances of<meds-instance-url>
and one instance of<put refresh token here...>
).seistore: service: '{"azure": {"azureGlabEnv":{"url": "https://<meds-instance-url>/seistore-svc/api/v3", "appkey": ""}}}' url: 'https://<meds-instance-url>/seistore-svc/api/v3' cloud_provider: 'azure' env: 'glab' auth-mode: 'JWT Token' ssl_verify: False auth_provider: azure: '{ "provider": "azure", "authorize_url": "https://login.microsoftonline.com/", "oauth_token_host_end": "/oauth2/token", "scope_end":"/.default openid profile offline_access", "redirect_uri":"http://localhost:8080", "login_grant_type": "refresh_token", "refresh_token": "<put refresh token here from auth_token.http authorize request>" }' azure: empty: 'none'
Note
If a token isn't already present, obtain one by following the directions in How to generate auth token.
Export or set the following environment variables:
export AZURE_TENANT_ID=<your-tenant-id> export AZURE_CLIENT_ID=<your-client-id> export AZURE_CLIENT_SECRET=<your-client-secret>
Running the tool
Run the sdutil tool from the extracted utility folder:
python sdutil
If you don't specify any arguments, this menu appears:
Seismic Store Utility > python sdutil [command] available commands: * auth : authentication utilities * unlock : remove a lock on a seismic store dataset * version : print the sdutil version * rm : delete a subproject or a space separated list of datasets * mv : move a dataset in seismic store * config : manage the utility configuration * mk : create a subproject resource * cp : copy data to(upload)/from(download)/in(copy) seismic store * stat : print information like size, creation date, legal tag(admin) for a space separated list of tenants, subprojects or datasets * patch : patch a seismic store subproject or dataset * app : application authorization utilities * ls : list subprojects and datasets * user : user authorization utilities
If this is your first time using the tool, run the
sdutil config init
command to initialize the configuration:python sdutil config init
Before you start using the tool and performing any operations, you must sign in to the system. When you run the following command, sdutil opens a sign-in page in a web browser:
python sdutil auth login
After you successfully sign in, your credentials are valid for a week. You don't need to sign in again unless the credentials expire.
Note
If you aren't getting the message about successful sign-in, make sure that your three environment variables are set and that you followed all steps in the Configuration section earlier in this tutorial.
Seismic Store resources
Before you start using the system, it's important to understand how Seismic Store manages resources. Seismic Store manages three types of resources:
- Tenant project: The main project. The tenant is the first section of the Seismic Store path.
- Subproject: The working subproject, which is directly linked under the main tenant project. The subproject is the second section of the Seismic Store path.
- Dataset: The dataset entity. The dataset is the third and last section of the Seismic Store path. You can specify the dataset resource by using the form
path/dataset_name
. In that form,path
is optional and has the same meaning as a directory in a generic file system. Thedataset_name
part is the name of the dataset entity.
The Seismic Store URI is a string that you use to uniquely address a resource in the system. You can obtain it by appending the prefix sd://
to the required resource path:
sd://<tenant>/<subproject>/<path>*/<dataset>
For example, if you have a results.segy
dataset stored in the qadata/ustest
directory structure in the carbon
subproject under the gtc
tenant project, the corresponding sdpath
code is:
sd://gtc/carbon/qadata/ustest/results.segy
You can address every resource by using the corresponding sdpath
section:
Tenant: sd://gtc
Subproject: sd://gtc/carbon
Dataset: sd://gtc/carbon/qadata/ustest/results.segy
Subprojects
A subproject in Seismic Store is a working unit where a user can save datasets. The system can handle multiple subprojects under a tenant project.
Only a tenant admin can create a subproject resource by using the following sdutil command:
> python sdutil mk *sdpath *admin@email *legaltag (options)
create a new subproject resource in Seismic Store. user can interactively
set the storage class for the subproject. only tenant admins are allowed to create subprojects.
*sdpath : the seismic store subproject path. sd://<tenant>/<subproject>
*admin@email : the email of the user to be set as the subproject admin
*legaltag : the default legal tag for the created subproject
(options) | --idtoken=<token> pass the credential token to use, rather than generating a new one
User management
To be able to use Seismic Store, users must be registered to at least a subproject resource with a role that defines their access level. Seismic store supports two roles scoped at the subproject level:
- Admin: Read/write access and user management.
- Viewer: Read/list access.
Only a subproject admin can register a user by using the following sdutil command:
> python sdutil user [ *add | *list | *remove | *roles ] (options)
*add $ python sdutil user add [user@email] [sdpath] [role]*
add a user to a subproject resource
[user@email] : email of the user to add
[sdpath] : seismic store subproject path, sd://<tenant>/<subproject>
[role] : user role [admin|viewer]
Usage examples
The following code is an example of how to use sdutil to manage datasets with Seismic Store. This example uses sd://gtc/carbon
as the subproject resource.
# Create a new file
echo "My Test Data" > data1.txt
# Upload the created file to Seismic Store
./sdutil cp data1.txt sd://gtc/carbon/test/mydata/data.txt
# List the contents of the Seismic Store subproject
./sdutil ls sd://gtc/carbon/test/mydata/ (display: data.txt)
./sdutil ls sd://gtc (display: carbon)
./sdutil ls sd://gtc/carbon (display: test/)
./sdutil ls sd://gtc/carbon/test (display: data/)
# Download the file from Seismic Store
./sdutil cp sd://gtc/carbon/test/mydata/data.txt data2.txt
# Check if the original file matches the one downloaded from Seismic Store
diff data1.txt data2.txt
Tool testing
The test folder contains a set of integral/unit and regression tests written for pytest. Run these tests to validate the sdutil tool's functionalities.
Use this code for requirements:
# Install required dependencies
pip install -r test/e2e/requirements.txt
Use this code for integral/unit tests:
# Run integral/unit test
./devops/scripts/run_unit_tests.sh
# Test execution parameters
--mnt-volume = sdapi root dir (default=".")
Use this code for regression tests:
# Run regression test
./devops/scripts/run_regression_tests.sh --cloud-provider= --service-url= --service-key= --idtoken= --tenant= --subproject=
# Test execution parameters
--mnt-volume = sdapi root dir (default=".")
--disable-ssl-verify (to disable ssl verification)
FAQ
How can I generate a new command for the tool?
Run the command generation script (./command_gen.py
) to automatically generate the base infrastructure for integrating a new command in the sdutil tool. The script creates a folder with the command infrastructure in sdlib/cmd/new_command_name
.
./scripts/command_gen.py new_command_name
How can I delete all files in a directory?
Use the following code:
./sdutil ls -lr sd://tenant/subproject/your/folder/here | xargs -r ./sdutil rm --idtoken=x.xxx.x
How can I generate the tool's changelog?
Run the changelog script (./changelog-generator.sh
) to automatically generate the tool's changelog:
./scripts/changelog-generator.sh
Usage for Azure Data Manager for Energy
The Azure Data Manager for Energy instance uses the OSDU® M12 version of sdutil. Complete the following steps if you want to use sdutil to take advantage of the Scientific Data Management System (SDMS) API of your Azure Data Manager for Energy instance:
Ensure that you followed the earlier installation and configuration steps. These steps include downloading the sdutil source code, configuring your Python virtual environment, editing the
config.yaml
file, and setting your three environment variables.Run the following commands to do tasks in Seismic Store.
Initialize:
(sdutilenv) > python sdutil config init [one] Azure Select the cloud provider: **enter 1** Insert the Azure (azureGlabEnv) application key: **just press enter--no need to provide a key** sdutil successfully configured to use Azure (azureGlabEnv) Should display sign in success message. Credentials expiry set to 1 hour.
Sign in:
python sdutil config init python sdutil auth login
List files in Seismic Store:
python sdutil ls sd://<tenant> # For example, sd://<instance-name>-<datapartition> python sdutil ls sd://<tenant>/<subproject> # For example, sd://<instance-name>-<datapartition>/test
Upload a file from your local machine to Seismic Store:
python sdutil cp local-dir/file-name-at-source.txt sd://<datapartition>/test/file-name-at-destination.txt
Download a file from Seismic Store to your local machine:
python sdutil cp sd://<datapartition>/test/file-name-at-ddms.txt local-dir/file-name-at-destination.txt
Note
Don't use the
cp
command to download VDS files. The VDS conversion results in multiple files, so thecp
command won't be able to download all of them in one command. Use either the SEGYExport or VDSCopy tool instead. These tools use a series of REST calls that access a naming scheme to retrieve information about all the resulting VDS files.
OSDU® is a trademark of The Open Group.
Next step
Advance to the next tutorial: