Skip to content

Commit

Permalink
re-release for the article revision
Browse files Browse the repository at this point in the history
  • Loading branch information
ckdckd145 committed Oct 7, 2024
1 parent c969be5 commit 95a2741
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
2 changes: 1 addition & 1 deletion article/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ @article{Virtanen:2020
doi={10.1038/s41592-019-0686-2}
}
@article{seabold:2010,
title={Statsmodels: Econometric and statistical modeling with python},
title={Statsmodels: Econometric and statistical modeling with Python},
author={Seabold, Skipper and Perktold, Josef},
journal={Proceedings of the 9th Python in Science Conference},
volume={57},
Expand Down
10 changes: 5 additions & 5 deletions article/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,24 +19,24 @@ bibliography: paper.bib

# Summary

Python is one of the most accessible and adaptable programming languages, utilized in various research domains, including statistics. However, few statistical packages inherit these characteristics of Python, leaving researchers unfamiliar with programming languages dependent on other expensive software. To address this gap, `Statmanager-kr` has been developed to provide non-programmers with convenient access to statistical functions. `Statmanager-kr` is designed to be compatible with `Pandas.DataFrame` and enables statistical analyses using a single method with a relatively small number of parameters. With `Scipy` and `Statsmodels` ensuring the validity of analyses, `Statmanager-kr` offers functions for hypothesis testing, comparing between-group and within-group differences, regression, correlations, data visualization, and more.
Python is one of the most accessible and adaptable programming languages, utilized in various research domains, including statistics. However, few statistical packages inherit these characteristics of Python, leaving researchers unfamiliar with programming languages dependent on other expensive software. To address this gap, `Statmanager-kr` has been developed to provide non-programmers with convenient access to statistical functions. `Statmanager-kr` is designed to be compatible with `Pandas.DataFrame` and enables statistical analyses using a single method with a relatively small number of parameters. With `SciPy` and `statsmodels` ensuring the validity of analyses, `Statmanager-kr` offers functions for hypothesis testing, comparing between-group and within-group differences, regression, correlations, data visualization, and more.

# Statement of need

`Statmanager-kr` is a statistical package for Python in `Pandas`. This package provides methods commonly used for null hypothesis significance testing (NHST), which is of interest to researchers in various fields [@Moon2020]. It is also possible to test for normality or equivariance using the Shapiro-Wilk, Levene, or F<sub>max</sub> tests.

Most of the statistical software available today is difficult to use, as a previous study reported that one of the challenges students face in statistics courses was "using software" [@Murtonen:2003]. Although there are basic statistical libraries in Python, such as Scipy [@seabold:2010] and Statsmodels [@Virtanen:2020], they are quite complex. While some studies require complex and detailed statistical modeling and analysis, there are also many studies that require only a few hypothesis tests. Therefore, the development of an easy-to-use statistical package would be of great benefit to these researchers.
Most of the statistical software available today is difficult to use, as a previous study reported that one of the challenges students face in statistics courses was "using software" [@Murtonen:2003]. Although there are basic statistical libraries in Python, such as `SciPy` [@Virtanen:2020] and `statsmodels` [@seabold:2010], they are quite complex. While some studies require complex and detailed statistical modeling and analysis, there are also many studies that require only a few hypothesis tests. Therefore, the development of an easy-to-use statistical package would be of great benefit to these researchers.

To achieve this, `Statmanager-kr` has been designed to run analyses with only three lines of code: 1. read data as a `Pandas.DataFrame`, 2. create a `Stat_Manager` object, 3. execute the `.progress()` method. Therefore, users can use `Statmanager-kr` as long as they know `Pandas` methods to read the data, such as `.read_csv()` or `.read_excel()`. It also includes functions to visualize the results depending on the analysis method.
To achieve this, `Statmanager-kr` has been designed to run analyses with only three lines of code: 1) read data as a `Pandas.DataFrame`, 2) create a `Stat_Manager` object, 3) execute the `progress()` method. Therefore, users can use `Statmanager-kr` as long as they know `Pandas` functions to read the data, such as `read_csv()` or `read_excel()`. It also includes functions to visualize the results depending on the analysis method.


# Related Work

Recent advances in the field of statistics have been achieved through the emergence of user-friendly packages, such as `Pingouin` [@vallat2018]. Pingouin is an easy-to-use statistics package that offers a wide range of analytical functions. Like `Pingouin`, `Statmanager-kr` is similar in that it aims to be a user-friendly statistics package.

However, `Statmanager-kr` and `Pingouin` differ in their target users. Since `Statmanager-kr` is designed for researchers with limited programming experience, it focuses on keeping the workflow short and concise; therefore, `Statmanager-kr` was designed to allow users to apply analyses and obtain results by always running a single method, `.progress()`, in a similar way. On the other hand, `Pingouin` was developed for users with a relatively high level of programming knowledge and experience; therefore, in terms of workflow, `Pingouin` offers more comprehensive and fine-tunable analysis methods and provides more detailed analysis results. Also, `Statmanager-kr` only works with `Pandas.DataFrame`, while `Pingouin` has the advantage of being compatible with a wider range of datasets.
However, `Statmanager-kr` and `Pingouin` differ in their target users. Since `Statmanager-kr` is designed for researchers with limited programming experience, it focuses on keeping the workflow short and concise; therefore, `Statmanager-kr` was designed to allow users to apply analyses and obtain results by always running a single method, `progress()`, in a similar way. On the other hand, `Pingouin` was developed for users with a relatively high level of programming knowledge and experience; therefore, in terms of workflow, `Pingouin` offers more comprehensive and fine-tunable analysis methods and provides more detailed analysis results. Also, `Statmanager-kr` only works with `Pandas.DataFrame`, while `Pingouin` has the advantage of being compatible with a wider range of date types.

Another difference is related to visualization and post-hoc. `Statmanager-kr` performs post-hoc by adding the parameter `posthoc` to the `.progress()`. In addition, it is possible to visualize the results by using `.figure()` as a method chaining. Although `Pingouin` does not provide the ability to directly visualize the results of an analysis, it does support the generation of graphs that are very useful from a statistical perspective, such as paired plots, shift plots, and circular mean plots. In addition, Pingouin has the advantage of supporting a wider range of post-hoc tests.
Another difference is related to post-hoc analysis and visualiztion. When running an analysis that allows post-hoc analysis, such as ANOVA, in `Statmanager-kr`, post-hoc analysis like Bonferroni correction can be performed by adding the `posthoc` parameter to the `progress` method. It is possible to visualize the results by using `figure()` as a method chaining in `Statmanager-kr`. Although `Pingouin` does not provide the functions to directly visualize the results of an analysis, it does support the generation of graphs that are very useful from a statistical perspective, such as paired plots, shift plots, and circular mean plots. `Pingoin` also has the advantage of supporting a wider range of post-hoc analyses, although it requires a separate method.

In conclusion, depending on the researcher's programming experience and the purpose of the study, `Statmanager-kr` and `Pingouin` can be used differently. Researchers who are familiar with programming may be better suited to use `Pingouin` as it supports more analysis methods and customization. On the other hand, `Statmanager-kr` is designed to be used by researchers who are not familiar with programming and coding, but want to get quick results.

Expand Down

0 comments on commit 95a2741

Please sign in to comment.