Python libraries I use in my research
Which you may (or may not) consider suitable for your research too
- General
- Geographical analysis
- Radar-related (nowcasting)
- Machine learning
- Plotting
- Code development
- Misc
- Dependencies (non-direct usage)
- The environment
I do have a pretty skimpy experience in programming. As many pupils in Russia, I had been learning Pascal for a couple of years in school. Then it was Fortran in university. During my PhD in Russia, I had written a couple of scripts also in Fortran. After that, I have stick to R for a couple of projects and finished my small journey with Python, which I use as a primary programming language since 2015.
Python does have a pretty powerful standard library, but Python’s real power is in the utilization of additional libraries that can provide you an endless functionality.
Below I categorize Python libraries I use in my research.
General
- numpy – the fundamental library for scientific computing.
- pandas – the fundamental library for data analysis.
- scipy – provides efficient numerical routines for, e.g., interpolation and optimization.
Geographical analysis
-
geopandas –
pandas
for geographical data. If you have.shp
or.geojson
files to analyze, you definitely should givegeopandas
a try! -
xarray – the best library for manipulation of meteorological datasets provided in
netcdf
orgrib
formats. - cdo – I use it exclusively for remapping meteorological data from one grid to another.
Radar-related (nowcasting)
- wradlib – the core library for weather radar data processing.
- opencv – the fundamental library for computer vision problems (e.g., computation of optical flow).
- scikit-image – image processing without a pain.
- trackpy – tracking particles made easy.
Machine learning
- scikit-learn – the core package for machine learning applications.
- tensorflow – deep neural networks development in a few of lines of code with built-in keras API.
- fbprophet – time-series forecasting library developed by Facebook. I use it for water levels forecasting in OpenLevels.
Plotting
- matplotlib – the core library for plotting.
-
seaborn – nicer plots,
pandas
support. - bokeh – interactive plots. For example, runoff forecasts produced by OpenForecast.
- folium – interactive maps, e.g. like in 1, 2, 3, 4.
-
branca – customization of
html
popups infolium
.
Code development
- jupyter – interactive IDE running in your web-browser.
- pylint – library developed for code analysis.
Misc
- h5py – make use of HDF files.
- numba – accelerates your code for fast compute.
- requests – API requests for humans.
Dependencies (non-direct usage)
- pytables – HDF files processing with
pandas
. - xlrd – reading xls with
pandas
. - dask – recommended
xarray
dependency. - netCDF4 – recommended
xarray
dependency. - bottleneck – recommended
xarray
dependency. - cartopy – recommended
xarray
dependency. - pynio – recommended
xarray
dependency. - pseudonetcdf – recommended
xarray
dependency. - nc-time-axis – recommended
xarray
dependency. - cfgrib – reading
grib
files. - eccodes –
cfgrib
dependency. - descartes – plotting polygons with
geopandas
. - pims – recommended
trackpy
dependency.
The environment
You can use conda to install all the libraries mentioned above in one isolated programming environment. Everything you need is just to create a file environment.yml
copy-pasting the text below and then run conda env create -f environment.yml
in your terminal.
name: megaenv
channels:
- conda-forge
dependencies:
- numpy
- pandas
- scipy
- geopandas
- xarray
- cdo
- wradlib
- opencv
- scikit-image
- trackpy
- scikit-learn
- tensorflow
- fbprophet
- matplotlib
- seaborn
- bokeh
- folium
- branca
- jupyter
- pylint
- h5py
- numba
- requests
- pytables
- xlrd
- dask
- netCDF4
- bottleneck
- cartopy
- pynio
- pseudonetcdf
- nc-time-axis
- cfgrib
- eccodes
- descartes
- pims
After the installation (can take really long), you need to activate the environment you created by running conda activate megaenv
in your terminal. Have fun!