Title: | Download, Wrangle, and Analyse Vessel Monitoring System Data |
---|---|
Description: | Allows to download, clean and analyse raw Vessel Monitoring System, VMS, data from Mexican government. You can use the vms_download() function to download raw data, or you can use the sample_dataset provided within the package. You can follow the tutorial in the vignette available at <https://cbmc-gcmp.github.io/dafishr/index.html>. |
Authors: | Fabio Favoretto [aut, cre] |
Maintainer: | Fabio Favoretto <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.1 |
Built: | 2025-02-10 05:16:05 UTC |
Source: | https://github.com/cbmc-gcmp/dafishr |
A sf
object containing shapefiles of MPA polygons in Mexico
all_mpas
all_mpas
A simple feature collection with 24 features and 5 fields
Name of the MPA in Spanish
Decree category, which define the type of MPA
State that have jurisdiction on the MPA
Municipality that have jurisdiction on the MPA
General regional localization of the MPA (in Spanish)
column containing geometry details
...
This functions eliminates points falling inland by using st_difference()
function from the sf
package.
clean_land_points(x, mx_inland = mx_inland)
clean_land_points(x, mx_inland = mx_inland)
x |
A data.frame containing latitude and longitude coordinates of vessels tracks to be cleaned by land area |
mx_inland |
is a shapefile loaded with the packages representing inland Mexico area, it can be uploaded with |
Points falling inland in Vessel Monitoring System, VMS, dataset are obvious mistakes, thus need to be eliminated from the data.
The function calls a stored shapefile mx_inland
which is a custom sf
object
created using a coastline buffer to avoid eliminating points because of lack of
precision within the shapefiles.
The function works with any dataset containing coordinate points in crs = 4326
and named latitude
and longitude
. See first example with a
non-VMS dataset.
A second example below shows the usage on VMS sample data.
A data.frame object
This function takes a while!! To test you can use the dplyr::sample_n()
function as it is shown in the example.
# with non VMS data x <- data.frame( longitude = runif(1000, min = -150, max = -80), latitude = runif(1000, min = 15, max = 35) ) data("mx_inland") x <- clean_land_points(x, mx_inland) # using sample_dataset data("sample_dataset", "mx_inland") vms_cleaned <- vms_clean(sample_dataset) vms_no_land <- clean_land_points(vms_cleaned, mx_inland) # You can check the results by plotting the data vms_cleaned_sf <- sf::st_as_sf(vms_cleaned, coords = c("longitude", "latitude"), crs = 4326) vms_no_land_sf <- sf::st_as_sf(vms_no_land, coords = c("longitude", "latitude"), crs = 4326) library(ggplot2) ggplot(vms_cleaned_sf) + geom_sf(col = "red") + geom_sf(data = vms_no_land_sf, col = "black") # in the provided example only few inland points are eliminated. # There are more evident one within historical data.
# with non VMS data x <- data.frame( longitude = runif(1000, min = -150, max = -80), latitude = runif(1000, min = 15, max = 35) ) data("mx_inland") x <- clean_land_points(x, mx_inland) # using sample_dataset data("sample_dataset", "mx_inland") vms_cleaned <- vms_clean(sample_dataset) vms_no_land <- clean_land_points(vms_cleaned, mx_inland) # You can check the results by plotting the data vms_cleaned_sf <- sf::st_as_sf(vms_cleaned, coords = c("longitude", "latitude"), crs = 4326) vms_no_land_sf <- sf::st_as_sf(vms_no_land, coords = c("longitude", "latitude"), crs = 4326) library(ggplot2) ggplot(vms_cleaned_sf) + geom_sf(col = "red") + geom_sf(data = vms_no_land_sf, col = "black") # in the provided example only few inland points are eliminated. # There are more evident one within historical data.
The function spatially joins the Vessels Monitoring System, VMS, points with the Marine Protected Area, MPAs, polygons in Mexico.
join_mpa_data(x, all_mpas = all_mpas)
join_mpa_data(x, all_mpas = all_mpas)
x |
A data.frame with VMS data that must contain columns longitude and latitude |
all_mpas |
A shape file that contains all MPA polygons in Mexico you can upload this using |
It adds three columns zone
, mpa_decree
, state
, municipality
, region
, which are data from the
MPAs polygon. zone
contains the name of the MPA (in Spanish) and when the vessel is outside an MPA polygon is dubbed as open area
,
mpa_decree
contains the type of MPA (such as National Park, etc.),
state
contains the Mexican state with jurisdiction on the MPA, municipality
contains the Mexican municipality with jurisdiction over the MPA,
and region
contains the overall location of the MPA (in Spanish)
A data.frame
# Use sample_dataset data("sample_dataset") data("all_mpas") vms_cleaned <- vms_clean(sample_dataset) vms_mpas <- join_mpa_data(vms_cleaned, all_mpas) # Plotting data # Points NOT inside MPA are removed to reduce data size vms_mpas_sub <- vms_mpas |> dplyr::filter(zone != "open area") vms_mpas_sf <- sf::st_as_sf(vms_mpas_sub, coords = c("longitude", "latitude"), crs = 4326) # Loading Mexico shapefile data("mx_shape") # Map library(ggplot2) ggplot(mx_shape, col = "gray90") + geom_sf(data = all_mpas, fill = "gray60") + geom_sf(data = vms_mpas_sf, aes(col = zone)) + theme_void() + theme(legend.position = "")
# Use sample_dataset data("sample_dataset") data("all_mpas") vms_cleaned <- vms_clean(sample_dataset) vms_mpas <- join_mpa_data(vms_cleaned, all_mpas) # Plotting data # Points NOT inside MPA are removed to reduce data size vms_mpas_sub <- vms_mpas |> dplyr::filter(zone != "open area") vms_mpas_sf <- sf::st_as_sf(vms_mpas_sub, coords = c("longitude", "latitude"), crs = 4326) # Loading Mexico shapefile data("mx_shape") # Map library(ggplot2) ggplot(mx_shape, col = "gray90") + geom_sf(data = all_mpas, fill = "gray60") + geom_sf(data = vms_mpas_sf, aes(col = zone)) + theme_void() + theme(legend.position = "")
The function joins ports locations using data from ports buffers. mx_ports
data is used which is
provided by INEGI https://en.www.inegi.org.mx/
join_ports_locations(x, mx_ports = mx_ports, buffer_size = 0.15)
join_ports_locations(x, mx_ports = mx_ports, buffer_size = 0.15)
x |
a data.frame with latitude and longitude coordinates |
mx_ports |
is a shapefile of point data storing coordinates of ports and marina in Mexico, you can upload this using |
buffer_size |
a number (double) indicating the size of the buffer for the ports to implement |
The function adds a location
column indicating if the vessel was at port or at sea.
A data.frame
# With sample data data("sample_dataset") data("mx_ports") vms_cleaned <- vms_clean(sample_dataset) # It is a good idea to subsample when testing... it takes a while on the full data! vms_subset <- dplyr::sample_n(vms_cleaned, 1000) with_ports <- join_ports_locations(vms_subset) with_ports_sf <- sf::st_as_sf(with_ports, coords = c("longitude", "latitude"), crs = 4326) data("mx_shape") library(ggplot2) ggplot(mx_shape) + geom_sf(col = "gray90") + geom_sf(data = with_ports_sf, aes(col = location)) + facet_wrap(~location) + theme_bw()
# With sample data data("sample_dataset") data("mx_ports") vms_cleaned <- vms_clean(sample_dataset) # It is a good idea to subsample when testing... it takes a while on the full data! vms_subset <- dplyr::sample_n(vms_cleaned, 1000) with_ports <- join_ports_locations(vms_subset) with_ports_sf <- sf::st_as_sf(with_ports, coords = c("longitude", "latitude"), crs = 4326) data("mx_shape") library(ggplot2) ggplot(mx_shape) + geom_sf(col = "gray90") + geom_sf(data = with_ports_sf, aes(col = location)) + facet_wrap(~location) + theme_bw()
This function uses normalmixEM
from the mixtools
package to model speed of vessels and estimates their behavior.
Specifically, if the vessel was in a fishing activity or cruising
model_vms(df)
model_vms(df)
df |
a data.frame preprocessed using the |
a data.frame with a vessel_state
column with the type of model implemented
preprocessing_vms(sample_dataset, destination.folder = tempdir()) df <- fst::read_fst(paste0(tempdir(), "/vms_2019_1_1_10_preprocessed.fst")) model_vms(df)
preprocessing_vms(sample_dataset, destination.folder = tempdir()) df <- fst::read_fst(paste0(tempdir(), "/vms_2019_1_1_10_preprocessed.fst")) model_vms(df)
A sf
object containing shapefiles of buffers around remote MPAs in Mexico.
The buffer equals the area inside each MPA polygon and was created to assess differences in fishing
activity inside or outside each of the remote MPAs.
mpas_buffers
mpas_buffers
A simple feature collection with 5 features and 2 fields
Name of the MPAs to which the buffer correspond
empty
column containing geometry details
...
this project
A sf
object containing a the Mexican coastline shapefile
mx_coastline
mx_coastline
A simple feature collection with 177 features and 3 fields
Name of the object
resolution rank
zoom precision
column containing geometry details
...
https://cran.r-project.org/package=rnaturalearth
A sf
object containing a buffer around Mexican coastline
that was used to create the inland shapefile available in this package.
mx_coastline_buffer
mx_coastline_buffer
A simple feature collection with 1 feature and 3 fields
Name of the object
resolution rank
zoom precision
column containing geometry details
...
https://cran.r-project.org/package=rnaturalearth
A sf
object containing the shapefile representing Mexico
mx_eez
mx_eez
A simple feature collection with 1 features and 2 fields
empty
empty
column containing geometry details
...
A sf
object containing shapefiles of Mexican EEZ in the Pacific
mx_eez_pacific
mx_eez_pacific
A simple feature collection with 1 feature and 1 field
Mexican Pacific Exclusive Economic Zone
column containing geometry details
...
A sf
object containing shapefiles of inland area in Mexico
mx_inland
mx_inland
A simple feature collection with 1 feature and 2 fields
Mexico
empty
column containing geometry details
...
modified from Mexican shapefile
A sf
object containing points representing the locations of Ports and Marinas in Mexico
mx_ports
mx_ports
A simple feature collection with 237 features and 2 fields
Type of infrastructure it can be Puerto (Port), or Marina
Name of the infrastructure (i.e. port or marina)
column containing geometry details
...
A sf
object containing a shapefile of Mexico
mx_shape
mx_shape
A simple feature collection with 1 feature and 2 fields
Mexico
empty
column containing geometry details
...
A data.frame
object containing catch data per each vessel from 2008 to 2021.
Vessels are only from the Pacific and are only Tuna, Sharks, and Marlin catches.
The dataset was created by wrangling and filtering the raw data (available under request to the authors).
pacific_landings
pacific_landings
A data.frame
with 23,231 rows and 5 columns
Date of the catch report
Vessel RNP unique ID code
Official name of the vessel
Final weight of the catch in tons
Days at sea that were declared at port
...
Data are available under request to CONAPESCA, a raw version of data is available under request to authors
A data.frame
object extracted from a raw dataset of permits available
under request at dataMares (https://datamares.org/)
pelagic_vessels_permits
pelagic_vessels_permits
A data.frame
with 719 rows and 2 columns.
Unique code identifying the vessel
Name of the vessel
...
This functions bundles all the cleaning functions and allows them to be
easily used in parallel processing to speed up the cleaning of all the Vessel Monitoring System, VMS, data .csv
files.
While it runs, it creates a folder called preprocessed
that will store VMS data that
underwent the preprocessing. If multiple files are used as input (see examples below) it will create
multiple files. All the outputs are in .fst
format, which allows fast upload of large files.
See fst
package documentation for further information https://www.fstpackage.org/.
preprocessing_vms(files.path, destination.folder)
preprocessing_vms(files.path, destination.folder)
files.path |
it can be a path to the file downloaded or the data object itself.
If function is used with a path it adds a |
destination.folder |
it must record the path to a folder were all the preprocessed files will be stored. |
A .fst
file saved within a directory chosen by the user, that is created automatically if does not exist, and that stores
each of the files that are used as input to the function.
# An example with the `sample.dataset` preprocessing_vms(sample_dataset, destination.folder = tempdir())
# An example with the `sample.dataset` preprocessing_vms(sample_dataset, destination.folder = tempdir())
A sf
object containing shapefiles of remote MPA polygons in Mexico that are of particular
conservation interest
remote_mpas
remote_mpas
A simple feature collection with 5 features and 2 fields
Name of the remote MPA in Spanish
empty
column containing geometry details
...
A data.frame
object extracted from a raw dataset of Vessels Monitoring System, VMS, data from the year 2019.
sample_dataset
sample_dataset
A data.frame
with 10,000 rows and 9 columns.
Name of the vessel
Unique code identifing the vessel
Base port where the vessel is officially registered
Owner of the vessel or partnership name
Date as "%d/%m/%Y %H:%M"
Latitude degree in WGS84, crs = 4326, of the position of the vessel
Longitude degree in WGS84, crs = 4326, of the position of the vessel
Speed in knots of the vessel at that specific time
Direction in degrees of the vessel at that specific time
...
This function cleans raw Vessel Monitoring System, VMS, data column files,
eliminate NULL values in coordinates, parse dates, and returns a data.frame
.
vms_clean(path_to_data)
vms_clean(path_to_data)
path_to_data |
it can be a path to the file downloaded or the data object itself.
If function is used with a path it adds a |
It takes a raw data file downloaded using the vms_download()
function by
specifying directly its path or by referencing a data.frame already stored as an R object.
If path is used, column with the name of the raw file is conveniently added as future reference.
It also split date into three new columns year
, month
, day
, and retains the original date
column.
This function can be used with apply
functions over a list
of files or it can be paralleled using furrr
functions.
A data.frame
# Using sample dataset, or a data.frame already stored as an object # It is possible to use a path directly as argument data("sample_dataset") cleaned_vms <- vms_clean(sample_dataset) head(cleaned_vms)
# Using sample dataset, or a data.frame already stored as an object # It is possible to use a path directly as argument data("sample_dataset") cleaned_vms <- vms_clean(sample_dataset) head(cleaned_vms)
This functions download data form the Datos Abiertos initiative
vms_download( year = lubridate::year((Sys.time())) - 1, destination.folder, check.url.certificate = TRUE )
vms_download( year = lubridate::year((Sys.time())) - 1, destination.folder, check.url.certificate = TRUE )
year |
year of data that user wants to download are selected default to the last year. A vector of years can also be used. |
destination.folder |
can be set to a folder where user want the data to be downloaded into. Defaults to working directory. |
check.url.certificate |
logical. Under Ubuntu systems the function might draw a certificate error, you can deactivate the certificate check by setting this to |
Data are downloaded from this link: https://www.datos.gob.mx/busca/dataset/localizacion-y-monitoreo-satelital-de-embarcaciones-pesqueras/
Downloaded data will be downloaded and decompressed in a VMS-data
folder in
a location chosen by the user by specifying a path in destination.folder
.
If a location is not specified it downloads data by default to the current working directory.
Within the main folder, data is organized in different folders by months (in Spanish names)
and within each there are multiple .csv
files each containing two weeks of data points.
saves downloaded data into a folder called VMS-data
within the directory specified
# Download single year # in Ubuntu it draws a certificate error when downloading, testing in windows and MacOS # does not draw that error and you can use default certificate checking. vms_download(2019, destination.folder = tempdir(), check.url.certificate = FALSE)
# Download single year # in Ubuntu it draws a certificate error when downloading, testing in windows and MacOS # does not draw that error and you can use default certificate checking. vms_download(2019, destination.folder = tempdir(), check.url.certificate = FALSE)