--- title: "Where to start" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Where to start} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r message=FALSE, warning=FALSE, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Downloading raw data on your computer Vessel Monitoring System (VMS) data comes organized by year. The function `vms_download()` automatically downloads it and store into the working directory in a VMS-data folder. Within the folder raw data are organized by monthly folders (with names in Spanish) that contain several .csv files that usually store byweekly data intervals. Each file have different rows and some have different column names. For that we highly recommend to use the `vms_clean()` function. The latter corrects several inconsistencies within the raw data. If you have any suggestion or spot some errors we will be very pleased if you create an issue. The function below downloads data from the year 2019. ```{r eval=FALSE} library(dafishr) vms_download(year = 2019, destination.folder = getwd()) ``` ## Using the cleaning functions The first `vms_clean()` function works on the VMS `data.frame`. You can either load downloaded data or use the `sample_dataset` that you can call and clean like so: ```{r eval=FALSE} library(dafishr) data("sample_dataset") vms_cleaned <- vms_clean(sample_dataset) ``` The `vms_clean()` function returns a message with the number of rows that were cleaned because they contained `NULL` values in coordinates. ## Spatial wrangling Once the dataset is wrangled, there are some other preprocessing steps to follow. First, all points that fall inland should be eliminated. This is because VMS data are vessels, thus points falling inland are errors in data registration. For that we will upload the `mx_inland` shapefile which helps eliminating all the points within a certain distance from the coastline. ```{r eval=FALSE} data("mx_inland") # Shapefile of inland Mexico area vms_cleaned_land <- clean_land_points(vms_cleaned, mx_inland) ``` ### Associating port data Once all land points are eliminated, we can use the `join_ports_locations()` function to label all the points where a vessel was inside a port or a marina. We achieve this by using the `mx_ports` shapefile that will be used to create a buffer around each port or marina location. Then each VMS point that falls within these buffers will be labelled as `at_port` or `at_sea` in a new column that will be automatically called `location`. ```{r eval=FALSE, message=FALSE, warning=FALSE} data("mx_ports") # If you are just testing, it is a good idea to subsample... # it takes a while on the full data! vms_subset <- dplyr::sample_n(vms_cleaned, 1000) with_ports <- join_ports_locations(vms_subset) ``` Now we can check the results in a map: ```{r eval=FALSE} with_ports_sf <- sf::st_as_sf(with_ports, coords = c("longitude", "latitude"), crs = 4326) data("mx_shape") library(ggplot2) ggplot2::ggplot(mx_shape) + geom_sf(col = "gray90") + geom_sf(data = with_ports_sf, aes(col = location)) + facet_wrap(~ location) + theme_bw() ```