You can download using the straight forward, interactive case. This makes getting a file pretty easy. You could also check out pooch for this purpose (which fetch_data is built around).

Simple case¶

[2]:

import fetch_data as fd

url = (
    "https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811101.nc",
    "https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811102.nc"
)

flist = fd.download(url, dest='~/Downloads', n_jobs=1, verbose=False)

print('\n'.join(flist))

/Users/luke/Downloads/avhrr-only-v2.19811101.nc
/Users/luke/Downloads/avhrr-only-v2.19811102.nc

Using wildcards¶

You can also use wild card allocation if you want to download multiple files from a server. Note that the server needs to allow this (especially for HTTP). In this case, the we get three files from an FTP server. You can also use this when the file name is not consistent.

[3]:

url = "ftp://ftp.cdc.noaa.gov/Datasets/noaa.oisst.v2/sst.ltm.*.nc"
flist = fd.download(url, dest='~/Downloads', n_jobs=1, verbose=False)

print('\n'.join(flist))

/Users/luke/Downloads/sst.ltm.1961-1990.nc
/Users/luke/Downloads/sst.ltm.1971-2000.nc
/Users/luke/Downloads/sst.ltm.1981-2010.nc

Downloading compressed files¶

Compressed files are automatically decompressed. This is done automatically based on the file extension. The extensions currently supported are: zip, gz, tar.

Below you can see what the output looks like.

[5]:

url = "https://www.metoffice.gov.uk/hadobs/en4/data/en4-2-1/EN.4.2.1.analyses.g10.2001.zip"
flist = fd.download(url, dest='~/Downloads', verbose=False)

print('\n'.join(flist))

/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200104.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200110.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200111.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200101.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200105.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200108.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200109.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200112.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200102.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200106.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200107.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200103.nc

YAML catalog¶

The advantage of using a catalog, is that all the information that you need to download the file is stored in a single file. This approach is better for a data pipeline approach.

Below is the format that your yaml file should take:

# the name of the variable goes here.
oisst_ice:
  # url is compulsory. Can use formatting as shown below, but has to be given as a kwarg
  url: ftp://ftp2.psl.noaa.gov/Datasets/noaa.oisst.v2.highres/icec.day.mean.{year}.nc
  # will default to ~/Downloads if not present.
  dest: ~/Downloads/NOAA_OISST/{year}/
  # name
  name: NOAA Optimally Interpolated Sea Surface Temperature
  meta:  # all entries in the meta will be written to README.txt file in the dest
    description: >
      Optimally interpolated sea surface temperature
    citation: >
      Reynolds, R.W., N.A. Rayner, T.M. Smith, D.C. Stokes, and W. Wang,
      2002: An improved in situ and satellite SST analysis for climate.
      J. Climate, 15, 1609-1625.
    doi: https://doi.org/10.1175/1520-0442(2002)015%3C1609:AIISAS%3E2.0.CO;2

[18]:

import fetch_data as fd

cat = fd.read_catalog('../tests/example_catalog.yml')
flist = fd.download(**cat['oisst_ice'], year=2000)

print('\n'.join(flist))

/Users/luke/Git/fetch-data/docs/tests/downloads/oisstv2/icec.day.mean.2000.nc

Logging¶

fetch_data also does logging to your session and/or to a file. This is useful if you want to track the progress of downloading many files. It may also be useful when some files fail to download (these are recorded).

[19]:

url = (
    "https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811101.nc",
    "https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811102.nc"
)

flist = fd.download(url, dest='~/Downloads', n_jobs=1, verbose=True)

2021-04-09 22:53:37 [DOWNLOAD]  ================================================================================


2021-04-09 22:53:37 [DOWNLOAD]  Start of logging session
2021-04-09 22:53:37 [DOWNLOAD]  --------------------------------------------------------------------------------
2021-04-09 22:53:37 [DOWNLOAD]    2 files at https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811101.nc
2021-04-09 22:53:37 [DOWNLOAD]  Files will be saved to /Users/luke/Downloads
2021-04-09 22:53:37 [DOWNLOAD]  retrieving https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811101.nc
2021-04-09 22:53:37 [DOWNLOAD]  retrieving https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811102.nc
2021-04-09 22:53:37 [DOWNLOAD]  SUMMARY: Retrieved=2, Failed=0 listing failed below:

[ ]: