Skip to content

Commit

Permalink
Merge pull request #13 from eurostat/dev
Browse files Browse the repository at this point in the history
adding rhub_check
  • Loading branch information
mmatyi authored Jan 26, 2025
2 parents d7ad328 + cb2c106 commit a906ec0
Show file tree
Hide file tree
Showing 10 changed files with 124 additions and 27 deletions.
95 changes: 95 additions & 0 deletions .github/workflows/rhub.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# R-hub's generic GitHub Actions workflow file. It's canonical location is at
# https://github.com/r-hub/actions/blob/v1/workflows/rhub.yaml
# You can update this file to a newer version using the rhub2 package:
#
# rhub::rhub_setup()
#
# It is unlikely that you need to modify this file manually.

name: R-hub
run-name: "${{ github.event.inputs.id }}: ${{ github.event.inputs.name || format('Manually run by {0}', github.triggering_actor) }}"

on:
workflow_dispatch:
inputs:
config:
description: 'A comma separated list of R-hub platforms to use.'
type: string
default: 'linux,windows,macos'
name:
description: 'Run name. You can leave this empty now.'
type: string
id:
description: 'Unique ID. You can leave this empty now.'
type: string

jobs:

setup:
runs-on: ubuntu-latest
outputs:
containers: ${{ steps.rhub-setup.outputs.containers }}
platforms: ${{ steps.rhub-setup.outputs.platforms }}

steps:
# NO NEED TO CHECKOUT HERE
- uses: r-hub/actions/setup@v1
with:
config: ${{ github.event.inputs.config }}
id: rhub-setup

linux-containers:
needs: setup
if: ${{ needs.setup.outputs.containers != '[]' }}
runs-on: ubuntu-latest
name: ${{ matrix.config.label }}
strategy:
fail-fast: false
matrix:
config: ${{ fromJson(needs.setup.outputs.containers) }}
container:
image: ${{ matrix.config.container }}

steps:
- uses: r-hub/actions/checkout@v1
- uses: r-hub/actions/platform-info@v1
with:
token: ${{ secrets.RHUB_TOKEN }}
job-config: ${{ matrix.config.job-config }}
- uses: r-hub/actions/setup-deps@v1
with:
token: ${{ secrets.RHUB_TOKEN }}
job-config: ${{ matrix.config.job-config }}
- uses: r-hub/actions/run-check@v1
with:
token: ${{ secrets.RHUB_TOKEN }}
job-config: ${{ matrix.config.job-config }}

other-platforms:
needs: setup
if: ${{ needs.setup.outputs.platforms != '[]' }}
runs-on: ${{ matrix.config.os }}
name: ${{ matrix.config.label }}
strategy:
fail-fast: false
matrix:
config: ${{ fromJson(needs.setup.outputs.platforms) }}

steps:
- uses: r-hub/actions/checkout@v1
- uses: r-hub/actions/setup-r@v1
with:
job-config: ${{ matrix.config.job-config }}
token: ${{ secrets.RHUB_TOKEN }}
- uses: r-hub/actions/platform-info@v1
with:
token: ${{ secrets.RHUB_TOKEN }}
job-config: ${{ matrix.config.job-config }}
- uses: r-hub/actions/setup-deps@v1
with:
job-config: ${{ matrix.config.job-config }}
token: ${{ secrets.RHUB_TOKEN }}
- uses: r-hub/actions/run-check@v1
with:
job-config: ${{ matrix.config.job-config }}
token: ${{ secrets.RHUB_TOKEN }}
7 changes: 3 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: restatapi
Type: Package
Title: Search and Retrieve Data from Eurostat Database
Date: 2024-09-30
Version: 0.23.2
Date: 2025-01-27
Version: 0.24.1
Encoding: UTF-8
Authors@R: c(person("Mátyás", "Mészáros", email = "[email protected]", role = c("aut", "cre")),
person("Sebastian", "Weinand", role = "ctb"))
Expand All @@ -17,5 +17,4 @@ Suggests: chron, knitr, rmarkdown, tinytest, remotes
NeedsCompilation: no
URL: https://github.com/eurostat/restatapi
BugReports: https://github.com/eurostat/restatapi/issues
RoxygenNote: 7.3.1
Packaged: 2021-10-06 08:46:52 UTC; mmeszaros
RoxygenNote: 7.3.2
9 changes: 9 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
# restatapi 0.24.1

- correction of outdated URLs and documentation

# restatapi 0.24.0

- correction of extraction of flags
- update of the `get_eurostat_toc()` function because of change in the API response

# restatapi 0.23.2

- correction of tests because change how the API handles confidentially suppressed data
Expand Down
2 changes: 2 additions & 0 deletions R/extract_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ extract_data<-function(xml_lf,keep_flags=FALSE,stringsAsFactors=FALSE,bulk=TRUE,
dr$OBS_STATUS<-""
} else{
f<-gsub("na","",f)
f<-tolower(f)
f<-gsub("@","",f)
dr$OBS_STATUS<-paste0(f,collapse="")
}
}
Expand Down
10 changes: 5 additions & 5 deletions R/get_eurostat_toc.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@
#' \code{lastModified}\tab The date when the structure of the dataset/table was last time modified\cr
#' \code{dataStart}\tab The start date of the data in the dataset/table\cr
#' \code{dataEnd}\tab The end date of the data in the dataset/table\cr
#' \code{values}\tab The number of values in the dataset/table, and it is filled only if the download
#' \code{mode} is "xml"\cr
#' \code{values}\tab The number of values in the dataset/table, and it is filled only if the download \code{mode} is "xml"\cr
#' \code{unit}\tab The unit name for tables in the language provided by the \code{lang} parameter, for
#' dataset it is empty and this column exists only if the download \code{mode} is "xml"\cr
#' \code{source}\tab The source of the data and it is filled only if the download \code{mode} is "xml"\cr
#' \code{shortDescription}\tab The short description of the values for tables in the language provided by the
#' \code{lang} parameter, for dataset it is empty and this column exists only if the download \code{mode} is "xml"\cr
#' \code{metadata.html}\tab The link to the metadata in html format, and this column exists only if the
Expand Down Expand Up @@ -171,12 +171,12 @@ get_eurostat_toc<-function(mode="xml",
})
}
if (exists("leafs")){
toc<-data.table::rbindlist(leafs,fill=TRUE)[,c(1:19)]
toc<-data.table::rbindlist(leafs,fill=TRUE)[,-'children']
type<-as.character(unlist(lapply(xml_leafs,xml2::xml_attr,attr="type")))
toc<-cbind(toc,type)
keep<-c(paste0("title.",lang),"code","type","lastUpdate","lastModified","dataStart","dataEnd","values",paste0("unit.",lang),paste0("shortDescription.",lang),"metadata.html","metadata.sdmx","downloadLink.tsv")
keep<-c(paste0("title.",lang),"code","type","lastUpdate","lastModified","dataStart","dataEnd","values",paste0("unit.",lang),paste0("source.",lang),paste0("shortDescription.",lang),"metadata.html","metadata.sdmx","downloadLink.tsv")
toc<-toc[,keep,with=FALSE]
names(toc)<-c("title","code","type","lastUpdate","lastModified","dataStart","dataEnd","values","unit","shortDescription","metadata.html","metadata.sdmx","downloadLink.tsv")
names(toc)<-c("title","code","type","lastUpdate","lastModified","dataStart","dataEnd","values","unit","source","shortDescription","metadata.html","metadata.sdmx","downloadLink.tsv")
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion R/load_cfg.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
#' This configuration code sets up the parallel processing to handle large XML files efficiently. By default if there is more then 4 cores/logical processors and at least 32 GB of RAM then
#' 4 cores are used for parallel computing. If there is more then 2 cores then 2 cores are used. This default configuration can be overwritten with \code{options(restatapi_cores=...)} or with the \code{max_cores=TRUE} parameter.
#' In the second case part of the computation distributed over the maximum number minus one cores. By using the \code{max_cores=TRUE} option there is a higher probability that the program will run out off memory for larger datasets.
#' In addition, the list of country codes are loaded to the variable \code{cc} (country codes), based on the \href{https://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_NOM_DTL&StrNom=CL_GEO&StrLanguageCode=EN&IntPcKey=42277583&IntResult=1&StrLayoutCode=HIERARCHIC}{Eurostat standard code list}
#' In addition, the list of country codes are loaded to the variable \code{cc} (country codes), based on the \href{https://webgate.ec.europa.eu/fusionregistry/sdmx/v2/structure/codelist/ESTAT/SCL_GEO_EUEFTACC/1.0}{Eurostat standard code list}
#' @examples
#' \donttest{
#' load_cfg(parallel=FALSE)
Expand Down
18 changes: 5 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ status](https://github.com/eurostat/restatapi/workflows/R-CMD-check/badge.svg)](
[![dependencies](https://tinyverse.netlify.app/badge/restatapi)](https://CRAN.R-project.org/package=restatapi)
[![CRAN version](https://www.r-pkg.org/badges/version/restatapi)](https://CRAN.R-project.org/package=restatapi )
[![CRAN status](https://badges.cranchecks.info/summary/restatapi.svg)](https://cran.r-project.org/web/checks/check_results_restatapi.html)
[![license](https://img.shields.io/badge/license-EUPL-success)](https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12)
[![license](https://img.shields.io/badge/license-EUPL-success)](https://interoperable-europe.ec.europa.eu/collection/eupl/eupl-text-eupl-12)
[![weekly downloads](https://cranlogs.r-pkg.org/badges/last-week/restatapi)](https://mybinder.org/v2/gh/mmatyi/restatapi_logs/b1320a7cd483638e1f12c8a1f5bf595cbbc32233?urlpath=shiny/ShinyApps/cran_stat/)
[![monthly downloads](https://cranlogs.r-pkg.org/badges/restatapi)](https://mybinder.org/v2/gh/mmatyi/restatapi_logs/b1320a7cd483638e1f12c8a1f5bf595cbbc32233?urlpath=shiny/ShinyApps/restatapi/)
[![all downloads](https://cranlogs.r-pkg.org/badges/grand-total/restatapi)](https://mmatyi.github.io/restatapi_logs/)
Expand All @@ -14,15 +14,6 @@ status](https://github.com/eurostat/restatapi/workflows/R-CMD-check/badge.svg)](
# restatapi
An R package to search and retrieve data from Eurostat database using SDMX

# <span style="color:red">IMPORTANT changes with the new Eurostat API</span>

Version 0.20.0 enables all the functionality for the [new dissemination chain](https://wikis.ec.europa.eu/display/EUROSTATHELP/Developer%27s+corner) and from version 0.20.3 it is the default API.

The new API has **breaking changes** concerning the `date_filter`. In the old dissemination chain the value was assigned to *the first day* of the month, quarter and year, so it was enough to filter for one day to get the value for the whole period. Under the new API the value belongs to the full period. If a date range does not cover the whole period no value is returned. For example, to get the value of the whole quarter the date filter should start at least on the first date of the quarter and end at least on the last day of the quarter. With exact numerical example to get the value for 2022/Q3, the `startDate` should be 2022-07-01 or earlier and the `endDate` should be 2022-09-30 or later. In the old version of the API it was enough if the period included the day 2022-07-01 only.

In addition to this change, if the date filter is only one day (e.g. `startDate=2007-07-02&endDate=2007-07-02`) then the new API gives back the values for all the time periods in the dataset applying the filter provided for the other concepts. But if the time period changes to more than one day (e.g. `startDate=2007-07-01&endDate=2007-07-02`) then the new API gives back only those values which are covered by the range. For more details see the updated description of the numerical examples in [Example 6](#updated-date-filter).


## installation

'restatapi' can be installed from [CRAN](https://CRAN.R-project.org/package=restatapi) by
Expand Down Expand Up @@ -57,7 +48,7 @@ The package contains 8 main functions and several other sub functions in 3 areas

Below there are examples demonstrating the main features, the detailed documentation of the functions is in the package.

Next to the functions the package contains a list of country codes for different groups of European countries based on the [Eurostat standard code list](https://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_NOM_DTL&StrNom=CL_GEO&StrLanguageCode=EN&IntPcKey=48517911&StrLayoutCode=HIERARCHIC), e.g.: European Union (EU28, ..., EU6), Euro Area (EA19, ..., EA11) or New Member States (NMS13, ..., NMS2).
Next to the functions the package contains a list of country codes for different groups of European countries based on the [Eurostat standard code list](https://webgate.ec.europa.eu/fusionregistry/sdmx/v2/structure/codelist/ESTAT/SCL_GEO_EUEFTACC/1.0), e.g.: European Union (EU28, ..., EU6), Euro Area (EA19, ..., EA11) or New Member States (NMS13, ..., NMS2).

## 10 examples

Expand Down Expand Up @@ -104,8 +95,9 @@ options(restatapi_cache_dir=file.path(tempdir(),"restatapi"))
```
<a name="updated-date-filter"></a>
**Example 6:** First download the annual (`select_freq="A"`) air passenger transport data for the main airports of Montenegro (`avia_par_me`) and do not cache any of the data (`cache=FALSE`). Then from the same table download the monthly (`select_freq="M"`) and quarterly (`filters="Q...`) data for 2 specific airport pairs/routes (`filters=...ME_LYPG_HU_LHBP+ME_LYTV_UA_UKKK"`) in August 2016 and on 1 July 2017 (`date_filter=c("2016-08","2017-07-01")`). The filters are provided in the format how it is required by the [REST SDMX web service](https://wikis.ec.europa.eu/pages/viewpage.action?pageId=44165555). Under the old API, it returned the value for the selected routes for the month August 2016, July 2017 and the 3rd quarter of 2017. Meanwhile under the ***new API***, it returns all the quarterly and monthly value, as there is a single day in the `date_filter`.
Then download again the monthly and quarterly data (`filters=c("Quarterly","Monthly")`) where there is exact match in the DSD for "HU" for August 2016 and 1 March 2014 (`date_filter=c("2016-08","2014-03-01")`). This query will provide only monthly data for 2016, as the quarterly data is always assigned to the first month of the quarter and there is no data for 2014. Since there is no exact match for the "HU" pattern, it returned all the monthly data for August 2016 and put the labels (like the name of the airports and units) so the data can be easier understood (`label=TRUE`) under the old API. Under the ***new API***, it returns all the quarterly and monthly data as there is a single day in the `date_filter`.
Finally, download only the quarterly data (`select_freq="Q"`) for several time periods (`date_filter=c("2017-03",2016,"2017-07-01",2012:2014)`, the order of the dates does not matter) where the "HU" pattern can be found anywhere, but only in the `code` column of the DSD (`filters="HU",exact_match=FALSE,name=FALSE`). The result was all the statistics about flights from Montenegro to Hungary in the 3rd quarter of 2017, as there is no information for the other time periods under the old API. Under the ***new API***, it gives back all the quarterly data in dataset for flights from Montenegro to Hungary because in the `date_filter` there is a single day.
Then download again the monthly and quarterly data (`filters=c("Quarterly","Monthly")`) where there is exact match in the DSD for "HU" for August 2016 and 1 March 2014 (`date_filter=c("2016-08","2014-03-01")`). This query will provide only monthly data for 2016, as the quarterly data is always assigned to the first month of the quarter and there is no data for 2014. Since there is no exact match for the "HU" pattern, it returned all the monthly data for August 2016 and put the labels (like the name of the airports and units) so the data can be easier understood (`label=TRUE`) under the old API. Under the ***current API***, it returns all the quarterly and monthly data as there is a single day in the `date_filter`.
Finally, download only the quarterly data (`select_freq="Q"`) for several time periods (`date_filter=c("2017-03",2016,"2017-07-01",2012:2014)`, the order of the dates does not matter) where the "HU" pattern can be found anywhere, but only in the `code` column of the DSD (`filters="HU",exact_match=FALSE,name=FALSE`). The result was all the statistics about flights from Montenegro to Hungary in the 3rd quarter of 2017, as there were no information for the other time periods under the old API. Under the ***current API***, it gives back all the quarterly data in the dataset for flights from Montenegro to Hungary because in the `date_filter` there is a single day.
Before 2022, in the old dissemination chain the value was assigned to *the first day* of the month, quarter and year, so it was enough to filter for one day to get the value for the whole period. Under the current API the value belongs to the full period. If a date range does not cover the whole period no value is returned. For example, to get the value of the whole quarter the date filter should start at least on the first date of the quarter and end at least on the last day of the quarter. With exact numerical example to get the value for 2022/Q3, the `startDate` should be 2022-07-01 or earlier and the `endDate` should be 2022-09-30 or later. In the old version of the API it was enough if the period included the day 2022-07-01 only.

```R
dt<-get_eurostat_data("avia_par_me",select_freq="A",cache=FALSE)
Expand Down
2 changes: 1 addition & 1 deletion inst/tinytest/test_restatapi.R
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ txt_toc<-get_eurostat_toc(mode="txt")
t2<-system.time({get_eurostat_toc()})[3]
expect_warning(get_eurostat_toc(mode="text")) # 1
if (!is.null(xml_toc)){
expect_equal(ncol(xml_toc),13) # 2
expect_equal(ncol(xml_toc),14) # 2
expect_true(exists("toc.xml.en",envir=restatapi::.restatapi_env)) # 3
if (!is.null(txt_toc)){
expect_equal(ncol(txt_toc),8) # 4
Expand Down
4 changes: 2 additions & 2 deletions man/get_eurostat_toc.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/load_cfg.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit a906ec0

Please sign in to comment.