Copy

The copy function is a utility designed to streamline the organization and selection of satellite data by creating a filtered subset from a larger dataset. This is particularly useful when the output directory from previous downloads contains excess data, such as files from multiple regions, time periods, or bands, making it challenging to isolate the relevant data for further analysis.

The copy function enables you to:

  • Filter by Spatial Extent: Use a shapefile to select only the data that overlaps with your area of interest, ensuring spatial relevance.

  • Filter by Filename Pattern: Apply patterns to include specific bands, file types, or other naming conventions, making it easy to extract only the required files.

  • Filter by Time Period: Specify a date range to extract files corresponding to a specific temporal interval, critical for time-series analysis.

  • Convert and Extract. .zip and .tar archives get extracted. .jp2 files get converted to GeoTiff. .hdf files get extracted and the images saved as GeoTiff's. For Sentinel the view/solar zenith and azimuth angels are extracted into files called ...B.tif

  • Include offsets and scaling factors: For each GeoTIFF the offsets and scaling factors needed to convert the digital numbers back to the physical unit are extracte from the metadata files and saved as a TAG inside the GeoTIFF.

By leveraging these filters, the copy function ensures that only the necessary data is available for downstream processing, reducing storage requirements and simplifying workflows. This is especially valuable when managing large datasets from multiple downloads or satellite sources.

1. Provide a Shapefile for Spatial Filtering¤

Define the path to a shapefile representing the area of interest. Only GeoTIFF files that overlap with the shapefile's boundaries will be copied. It can happen that more data is downloaded because most data providers only allow filtering for boundaries instead of the exact shape. Supported formats: .shp, .geojson, .kml, .gpkg.

2. Apply a Filename Pattern (Optional)¤

Use a pattern to filter files based on their names. This is useful for copying specific bands or subsets of data. For ZIP-Archives or TAR-Archives or HDF-Files the filter is applied to the compressed files not on the archive itself. The pattern supports wildcards for flexible matching. This is not a REGEX sequence.

  • * matches any sequence of characters (including none).
  • ? matches a single character.
  • [seq] matches any character in the sequence.
  • [!seq] matches any character not in the sequence.

The default is a wildcard [*]

3. Set Temporal Filters (Optional)¤

Define a start and end date to filter files based on their recording dates. The function extracts the earliest date from the file name or path to determine eligibility.

Accepted formats: yyyymmdd or yyyy-mm-dd.

Default: No date filtering (all files are included).

Multi-Threading¤

To increase the efficiency the copy function will make use of multi-threading (not multi-processing). The number of threads can be defined with the num_processes parameter. The general rule is to use not more threads than the number of physical cores on the machine. Using more threads then that makes the executen not more efficient.

Example¤

Python
from sipt.processing import copy

file_pattern = "*B[123]*"
# or multiple patterns
file_pattern = ["*B1", "*B2", "*B3"]

copy("./data", "./src", "./shapefile.geojson", file_pattern, '2024-08-01', '2024-08-03', 8)