You can download using the straight forward, interactive case. This makes getting a file pretty easy. You could also check out pooch
for this purpose (which fetch_data
is built around).
Simple case¶
[2]:
import fetch_data as fd
url = (
"https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811101.nc",
"https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811102.nc"
)
flist = fd.download(url, dest='~/Downloads', n_jobs=1, verbose=False)
print('\n'.join(flist))
/Users/luke/Downloads/avhrr-only-v2.19811101.nc
/Users/luke/Downloads/avhrr-only-v2.19811102.nc
Using wildcards¶
You can also use wild card allocation if you want to download multiple files from a server. Note that the server needs to allow this (especially for HTTP). In this case, the we get three files from an FTP server. You can also use this when the file name is not consistent.
[3]:
url = "ftp://ftp.cdc.noaa.gov/Datasets/noaa.oisst.v2/sst.ltm.*.nc"
flist = fd.download(url, dest='~/Downloads', n_jobs=1, verbose=False)
print('\n'.join(flist))
/Users/luke/Downloads/sst.ltm.1961-1990.nc
/Users/luke/Downloads/sst.ltm.1971-2000.nc
/Users/luke/Downloads/sst.ltm.1981-2010.nc
Downloading compressed files¶
Compressed files are automatically decompressed. This is done automatically based on the file extension. The extensions currently supported are: zip, gz, tar
.
Below you can see what the output looks like.
[5]:
url = "https://www.metoffice.gov.uk/hadobs/en4/data/en4-2-1/EN.4.2.1.analyses.g10.2001.zip"
flist = fd.download(url, dest='~/Downloads', verbose=False)
print('\n'.join(flist))
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200104.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200110.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200111.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200101.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200105.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200108.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200109.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200112.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200102.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200106.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200107.nc
/Users/luke/Downloads/EN.4.2.1.analyses.g10.2001.zip.unzip/EN.4.2.1.f.analysis.g10.200103.nc
YAML catalog¶
The advantage of using a catalog, is that all the information that you need to download the file is stored in a single file. This approach is better for a data pipeline approach.
Below is the format that your yaml
file should take:
# the name of the variable goes here.
oisst_ice:
# url is compulsory. Can use formatting as shown below, but has to be given as a kwarg
url: ftp://ftp2.psl.noaa.gov/Datasets/noaa.oisst.v2.highres/icec.day.mean.{year}.nc
# will default to ~/Downloads if not present.
dest: ~/Downloads/NOAA_OISST/{year}/
# name
name: NOAA Optimally Interpolated Sea Surface Temperature
meta: # all entries in the meta will be written to README.txt file in the dest
description: >
Optimally interpolated sea surface temperature
citation: >
Reynolds, R.W., N.A. Rayner, T.M. Smith, D.C. Stokes, and W. Wang,
2002: An improved in situ and satellite SST analysis for climate.
J. Climate, 15, 1609-1625.
doi: https://doi.org/10.1175/1520-0442(2002)015%3C1609:AIISAS%3E2.0.CO;2
[18]:
import fetch_data as fd
cat = fd.read_catalog('../tests/example_catalog.yml')
flist = fd.download(**cat['oisst_ice'], year=2000)
print('\n'.join(flist))
/Users/luke/Git/fetch-data/docs/tests/downloads/oisstv2/icec.day.mean.2000.nc
Logging¶
fetch_data
also does logging to your session and/or to a file. This is useful if you want to track the progress of downloading many files. It may also be useful when some files fail to download (these are recorded).
[19]:
url = (
"https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811101.nc",
"https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811102.nc"
)
flist = fd.download(url, dest='~/Downloads', n_jobs=1, verbose=True)
2021-04-09 22:53:37 [DOWNLOAD] ================================================================================
2021-04-09 22:53:37 [DOWNLOAD] Start of logging session
2021-04-09 22:53:37 [DOWNLOAD] --------------------------------------------------------------------------------
2021-04-09 22:53:37 [DOWNLOAD] 2 files at https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811101.nc
2021-04-09 22:53:37 [DOWNLOAD] Files will be saved to /Users/luke/Downloads
2021-04-09 22:53:37 [DOWNLOAD] retrieving https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811101.nc
2021-04-09 22:53:37 [DOWNLOAD] retrieving https://www.ncei.noaa.gov/thredds/fileServer/OisstBase/NetCDF/V2.0/AVHRR/198111/avhrr-only-v2.19811102.nc
2021-04-09 22:53:37 [DOWNLOAD] SUMMARY: Retrieved=2, Failed=0 listing failed below:
[ ]: