Taxonomy
What?
Taxonomy.jl aims to serve as a comprehensive database of structural equation models (SEMs) that can be used to infer distributions of both structures (e.g., types of models, numbers of observed and latent variables) and parameters (e.g., what is the average factor loading). This will greatly facilitate simulation studies that accurately reflect real-world conditions and therefore takes the idea that "Simulation studies are to a statistician what experiments are to a scientist" Pawel & Kook et al. seriously. Having a common basis for setting parameters in simulations will also reduce the extremely wide latitude that statisticians have to create an overly positive image of the strengths of novel methods. So-called researchers-degrees of freedom are already a concern in empirical studies but simulation studies exaggerate the issue by allowing an almost infinite freedom over the data-generating process. Additionally, Taxonomy.jl will provide a user-friendly interface for researchers to easily sample these parameters for use in their own simulation studies.
End product
A Julia package that enables filtering a taxonomy database and construct samplers for structure and parameters. These samplers can quickly be turned into models for StructuralEquationModels.jl.
So what?
Simulations are only helpful in so far as they reflect some (simplified) aspects of reality. A simulation, that is only based on the guess of the researchers conducting it, is prone to fail to represent realistic conditions and may even favour the procedures that are investigated unduly. Being able to base simulations on a sample of the literature strengthens the inference and practical impact of simulation studies.
Impact on scientific community
The Julia package will provide an according database and interface, and therefore lower the threshold for the conduction of (better) simulation studies. By this, it may enable more general claims and facilitate preregistration of simulations. Furthermore, Bayesian methodology has highlighted the importance of incorporating prior knowledge in the form of prior distributions. Taxonomy.jl seeks to make this process more transparent and accessible for researchers and consumers of science, which can greatly facilitate cumulative science. Besides enabling better simulations, knowing how common different types of SEMs are may greatly help guiding the development of new methodologies.
Index
Taxonomy.DOI
Taxonomy.ExtensiveMeta
Taxonomy.IncompleteMeta
Taxonomy.Judgements.J
Taxonomy.Judgements.Judgement
Taxonomy.LatentPathmodel
Taxonomy.Measurement
Taxonomy.MinimalMeta
Taxonomy.Model
Taxonomy.NoDOI
Taxonomy.NoLocation
Taxonomy.NoTaxonEver
Taxonomy.NoTaxonYet
Taxonomy.Record
Taxonomy.RecordDatabase
Taxonomy.SimpleLGCM
Taxonomy.Structural
Taxonomy.Study
Taxonomy.Taxon
Taxonomy.UnusualDOI
Taxonomy.UsualDOI
Taxonomy.Judgements.NoJudgement
Taxonomy.Judgements.certainty
Taxonomy.Judgements.rating
Taxonomy.MetaData
Taxonomy.apa
Taxonomy.author
Taxonomy.factor_variance
Taxonomy.generate_id
Taxonomy.journal
Taxonomy.json
Taxonomy.structural_model
Taxonomy.url
Taxonomy.valid_doi
Taxonomy.year
Data Base
Taxonomy.RecordDatabase
— TypeRecordDatabase
Record
s need to be stored somewhere.
julia> RecordDatabase()
RecordDatabase{Base.UUID, Record}()
julia> first = Record(rater = "AP", id = "552ef675-5c7b-4ce1-880b-c45b833fdfcb", location = NoLocation(), meta = MetaData(missing, missing, missing));
julia> second = Record(rater = "AP", id = "58c55701-0362-40c7-849c-5d12e5026238", location = NoLocation(), meta = MetaData(missing, missing, missing));
julia> rd = RecordDatabase(first, second)
RecordDatabase{Base.UUID, Record} with 2 entries:
UUID("58c55701-0362-40c7… => Record…
UUID("552ef675-5c7b-4ce1… => Record…
julia> rd += Record(id="921c777a-0cc6-44da-a444-e610bfacbb07", rater="AP", location=NoLocation(), meta = MetaData(missing, missing, missing))
RecordDatabase{Base.UUID, Record} with 3 entries:
UUID("921c777a-0cc6-44da… => Record…
UUID("58c55701-0362-40c7… => Record…
UUID("552ef675-5c7b-4ce1… => Record…
Taxons
using Taxonomy, AbstractTrees
AbstractTrees.children(d::DataType) = subtypes(d)
print_tree(Taxonomy.Taxon)
Taxon
├─ AbstractCFA
│ └─ Measurement
├─ AbstractCLPM
├─ AbstractLGCM
│ └─ SimpleLGCM
├─ AbstractPathmodel
│ └─ Structural
└─ NoAbstractTaxon
└─ NoTaxon
Taxonomy.Taxon
— TypeTaxon is the supertype of all taxons.
CFA
Taxonomy.Measurement
— TypeMeasurement AbstractCFA. Building Block for Taxonomy. Multiple Measurements can be combined to a Taxon.
Arguments
n_variables
: Number of variables (possibly observed/manifest). If items are parceled, this is the number of parcels.loadings
: Vector of loadings, one for each item. If both standardized and unstandardized loadings are reported, code standardized.factor_variance
: Variance of the factor.error_variances
: Vector of variances of the respective errorserror_covariances_within
: Vector of covariances within factor. If unknown, set to missing, if there are no covariances, set to Float64[].error_covariances_between
: Vector of covariances the factor shares with a different factor. If unknown, set to missing, if there are no covariances, set to Float64[].crossloadings_incoming
: Vector of crossloadings coming from other factors. They should be lower than the loading coming to the item from this factor. If unknown, set to missing, if there are none, set to Float64[].crossloadings_outgoing
: Vector of crossloadings going to other items which have higher loadings from other factors. If unknown, set to missing, if there are none, set to Float64[].quest_scale
: Scale of the questionnaire. Anything more than ten is Inf. E.g: 5 point likert scale -> 5.
Measurement(n_variables = 2, loadings = [1, 0.4], factor_variance = 0.6, quest_scale = 5)
# output
Measurement
n_variables: JudgementInt{Int64}
loadings: JudgementVecNumber{Vector{Float64}}
factor_variance: JudgementNumber{Float64}
error_variances: JudgementVecNumber{Missing}
error_covariances_within: JudgementVecNumber{Missing}
error_covariances_between: JudgementVecNumber{Missing}
crossloadings_incoming: JudgementVecNumber{Missing}
crossloadings_outgoing: JudgementVecNumber{Missing}
quest_scale: JudgementNumber{Int64}
Pathmodels
Taxonomy.Structural
— TypeStructural AbstractPathmodel. Consists of a graph from StenoGraphs (structural model).
Arguments
structural_model
: Graph from StenoGraphs package. Defines the latent relations between the factors of measurement_model.
using StenoGraphs
graph = @StenoGraph begin
# latent regressions
fac1 → fac2
end
Structural(structural_model = graph)
Taxonomy.LatentPathmodel
— TypeCreate a new LatentPathmodel
instance.
for indexing: mymodel.structuralmodel.structuralmodel mymodel.measurement_model[:fac2]
using StenoGraphs
graph = @StenoGraph begin
# latent regressions
fac1 → fac2
end
my_model = LatentPathmodel(
Structural(structural_model = graph),
Dict(
:fac1 => Measurement(n_variables = 2, loadings = [1, 0.4], factor_variance = 0.6),
:fac2 => Measurement(n_variables = 2, loadings = [1, 0.4], factor_variance = 0.6)
)
)
# output
LatentPathmodel
structural_model: Structural
measurement_model: Dict{Symbol, Measurement}
Cross Lagged Panel Model
Linear Growth Curve Model
Taxonomy.SimpleLGCM
— TypeSimpleLGCM AbstractLGCM. Taxon for Linear Growth Curve Model.
## Arguments
n_timepoints
: Number of measurement timepoints.timecoding
: Vector containing the coding of the measurement time points (loadings of the slope onto the timepoints).intercept
: Intercept constant.slope
: Slope constant.nonlinear_timecoding
: Vector for the timecodings introduced by a nonlinear function.variance_intercept
: Variance of the intercept.variance_slope
: Variance of the slope.covariance_intercept_slope
: Covariance between intercept and slope.variances_timepoints
: Vector with variances of the timepoint variables.n_predictors
: Number of predictors on intercept and slope.predictor_paths_intercept
: Vector for the predictor-paths to the intercept.predictor_paths_slope
: Vector for the predictor-paths to the slope.
SimpleLGCM(n_timepoints = 6, timecoding = [0, 1, 2, 3, 4, 5], intercept = 10.2,
slope = 0.96, nonlinear_timecoding = [1, 2, 4, 9, 16, 25], variance_intercept = 1, variance_slope = 1, covariance_intercept_slope = 0.1,
n_predictors = 2, predictor_paths_intercept = [2, 4], predictor_paths_slope = [3, 5])
# output
SimpleLGCM
n_timepoints: JudgementInt{Int64}
timecoding: JudgementVecNumber{Vector{Int64}}
intercept: JudgementNumber{Float64}
slope: JudgementNumber{Float64}
nonlinear_timecoding: JudgementVecNumber{Vector{Int64}}
variance_intercept: JudgementNumber{Int64}
variance_slope: JudgementNumber{Int64}
covariance_intercept_slope: JudgementNumber{Float64}
variances_timepoints: JudgementNumber{Missing}
n_predictors: JudgementInt{Int64}
predictor_paths_intercept: JudgementVecNumber{Vector{Int64}}
predictor_paths_slope: JudgementVecNumber{Vector{Int64}}
No Taxon
Taxonomy.NoTaxonEver
— TypeNoTaxonEver()
A Taxon to show, that there is no SEM to code in this paper.
julia> NoTaxonEver()
NoTaxonEver
julia> NoTaxon()
NoTaxonEver
Taxonomy.NoTaxonYet
— TypeNoTaxonYet()
A Taxon to show, that there is in fact a model to be coded, but this is at the current point not possible. NoTaxonYet requires a timestamp 'accessdate' to further specify at what point there were no possibility to code the respective model. NoTaxonYet gives you the option to name the 'modeltype', you were not able to code, to spare your future self the work of going everything through.
julia> NoTaxonYet()
Extractors
Taxonomy.factor_variance
— FunctionFunction to extract factor variance.
Arguments
x
:Measurement
orStandalone_Factor
.
Return
Returns a Judgement
f = Measurement(n_variables = 2, loadings = [1, 0.4], factor_variance = 1.0)
rating(factor_variance(f))
# output
1.0
Taxonomy.structural_model
— FunctionFunction to extract the StenoGraphs
structural model from [Structural
].
Arguments
x
:Strucutral
.
Return
Returns a Judgement
using Taxonomy
using StenoGraphs
graph = @StenoGraph begin
# latent regressions
fac1 → fac2
end
struct_model = Structural(structural_model = graph)
structural_model(struct_model)
Extract Structural part from a Taxon.
Levels
Taxonomy.Record
— TypeRecord(j...; rater = missing, id = missing, location = missing, meta = missing)
Record represents every paper(like) thing that is coded. It contains who is rating it, what is being rated (uniquely identified by an id
), where to find it (location
) and other metadata (usually automatically infered). If id
is missing, a warning will be generated and an ID will be suggested. This ID is linked to the location
(either a DOI or URL). If no location
is provided, the ID will be generated at random. You can use this suggested ID or generate your own using generate_id(DOI("yourdoi"))
or generate_id(url("yoururl"))
.
Taxonomy.Study
— TypeStudy()
Use this function to group together multiple Judgements and/or Taxons on the Study level.
Return: Output value will be a dictionary containing StudyJudgements and/or Taxons.
Taxonomy.Model
— TypeModel()
Use this function to group together multiple Judgements and/or Taxons on the Model level.
Return: Output value will be a dictionary containing ModelJudgements and/or Taxons.
ID
Taxonomy.generate_id
— FunctionGenerate an Entry ID
To create links between entries we need a stable reference point. This ID is generated initially from url(location)
and if the url is missing, it is generated randomly. After the ID is generated once, it is saved with the Record
and should not be changed.
Judgement
Taxonomy.Judgements.Judgement
— TypeJudgement(r::Union{<: Any, Missing}, c = 1.0, l = missing})
Level: Taxonomy.Judgements.AnyLevelJudgement
A generic judgment without any checks on content.
Arguments
rating
: The rating, e.g. "Structural" or 1.0.certainty
: If uncertain, a number between 0.0 and 1.0 (0-100%)comment
: information on why the judgement was made, may contain information about the source within the paper, e.g., section, page, table number, figure number.
julia> Judgement(1.0, .99, "Figure 1");
Taxonomy.Judgements.J
— TypeShorthand for Judgement
Taxonomy.Judgements.NoJudgement
— FunctionAbstaining from any judgement.
This implies that your best guess is missing and you are absolutely uncertain about this judgement.
julia> NoJudgement()
Judgement{Missing}(missing, 0.0, missing)
Taxonomy.Judgements.rating
— FunctionExtract rating from Judgement.
If rating
is called on a Judgement
it returns the rating, on everything it returns identity. If rating
is called on a JudgementLevel
together with a field name, it returns the rating of that field.
Taxonomy.Judgements.certainty
— FunctionExtract certainty from Judgement.
If certainty
is called on a Judgement
it returns the certainty, on everything it returns identity. If certainty
is called on a JudgementLevel
together with a field name, it returns the certainty of that field.
Metadata
Taxonomy.MetaData
— FunctionSave metadata.
Can be from complete minimal metadata, incomplete metadata or preferably from DOI
.
julia> min = MetaData("Peikert, Aaron", 2022, "Journal of Statistical Software");
julia> incomplete = MetaData("Peikert, Aaron", 2022, missing);
julia> extensive = MetaData(DOI("10.5281/zenodo.6719627"));
Taxonomy.apa
— FunctionGet an APA citation.
julia> apa(DOI("10.5281/zenodo.6719627"))
"Ernst, M. S., & Peikert, A. (2022). <i>StructuralEquationModels.jl</i> (Version v0.1.0) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.6719627"
Taxonomy.json
— FunctionGet a Citeproc JSON.
CSL JSON can be read by Zotero and automatically generated by doi.org from DOI. All availible information are included and saved in a Dict
.
julia> json(DOI("10.5281/zenodo.6719627"))
Dict{String, Any} with 11 entries:
"publisher" => "Zenodo"
"issued" => Dict{String, Any}("date-parts"=>Any[Any[2022, 6, 24]])
"author" => Any[Dict{String, Any}("family"=>"Ernst", "given"=>"Maximilian …
"id" => "https://doi.org/10.5281/zenodo.6719627"
"copyright" => "MIT License"
"version" => "v0.1.0"
"DOI" => "10.5281/ZENODO.6719627"
"URL" => "https://zenodo.org/record/6719627"
"title" => "StructuralEquationModels.jl"
"abstract" => "StructuralEquationModels v0.1.0 This is a package for Structu…
"type" => "book"
Taxonomy.author
— FunctionExtract the author.
julia> doi = MetaData(DOI("10.1126/SCIENCE.169.3946.635"));
julia> author(doi)
"Frank, Henry S."
Taxonomy.year
— FunctionExtract the year.
julia> doi = MetaData(DOI("10.1126/SCIENCE.169.3946.635"));
julia> year(doi)
1970
Taxonomy.journal
— FunctionExtract the journal.
julia> doi = MetaData(DOI("10.1126/SCIENCE.169.3946.635"));
julia> journal(doi)
"Science"
Taxonomy.MinimalMeta
— TypeA representation of the most important metadata.
julia> min = MetaData("Peikert, Aaron", 2022, "Journal of Statistical Software");
julia> typeof(min)
MinimalMeta
Taxonomy.IncompleteMeta
— TypeA representation of Metadata when we can not even capture the most important metadata.
julia> incomplete = MetaData(missing, 2022, "Journal of Statistical Software");
julia> typeof(incomplete)
IncompleteMeta
Taxonomy.ExtensiveMeta
— TypeThe metadata we can gather from doi.org.
julia> doi = MetaData(DOI("10.1126/SCIENCE.169.3946.635"));
julia> typeof(doi)
ExtensiveMeta{MinimalMeta}
DOI
Taxonomy.DOI
— TypeAlias for UsualDOI
.
Taxonomy.UsualDOI
— TypeConstruct a validated DOI
Most valid DOIs (not all) can be simply validated via a regular expression.
Arguments
doi::String
: a DOI without resolver (e.g. without doi.org), capitalization does not matterfallback::String
: an optional fallback link where one maybe can find the content in case the doi fails
julia> DOI("10.5281/zenodo.6719627")
UsualDOI{String, Missing}("10.5281/ZENODO.6719627", missing)
julia> DOI("10.5281/zenodo.6719627", "https://github.com/StructuralEquationModels/StructuralEquationModels.jl")
UsualDOI{String, String}("10.5281/ZENODO.6719627", "https://github.com/StructuralEquationModels/StructuralEquationModels.jl")
Taxonomy.UnusualDOI
— TypeConstruct an unvalidated DOI
You should prefer an validated UsualDOI
but if you have tested the DOI and are sure it links were it supposed to link, go ahead and create an unvalidated doi.
julia> UnusualDOI("weird10.5281doi/zenodo.6719627")
UnusualDOI{String, Missing}("WEIRD10.5281DOI/ZENODO.6719627", missing)
julia> UnusualDOI("weird10.5281doi/zenodo.6719627", "https://github.com/StructuralEquationModels/StructuralEquationModels.jl")
UnusualDOI{String, String}("WEIRD10.5281DOI/ZENODO.6719627", "https://github.com/StructuralEquationModels/StructuralEquationModels.jl")
Taxonomy.NoDOI
— TypeWhat to do if there is no doi
Last resort if there is no DOI. Than we save other metadata, similar to BibTex.
Arguments
url::String
: an link where one maybe can find the content in case the doi failsauthor::String
: like in BibTex, e.g. "Peikert, Aaron and Ernst, Maximilian S. and Bode, Clifford"date::Union{Date, Missing}
: optional dateyear::Union{Int64}
: optional if date is suppliedjournal::String
: The outlet of the publicationother::Dict
: more BibTexlike metadata
NoDOI(
url = "https://github.com/StructuralEquationModels/StructuralEquationModels.jl",
author = "Ernst, Maximilian Stefan and Peikert, Aaron",
title = "StructuralEquationModels.jl: A fast and flexible SEM framework",
date = Date("2022-06-24"), # year is inferred
journal = "No Real Journal",
awesome = "Yes", # other metadata
software = "naturally", # some more metadata
citations = 500
)
NoDOI(
url = "https://github.com/StructuralEquationModels/StructuralEquationModels.jl",
author = "Ernst, Maximilian Stefan and Peikert, Aaron",
title = "StructuralEquationModels.jl: A fast and flexible SEM framework",
year = 2022, # date is omitted
journal = "No Real Journal"
)
Taxonomy.NoLocation
— TypeWhen everything fails.
This is a placeholder if really no location can be found.
Taxonomy.url
— FunctionGet URL from location.
julia> url(DOI("10.1126/SCIENCE.169.3946.635"))
"https://doi.org/10.1126/SCIENCE.169.3946.635"
location = NoDOI(
url = "https://github.com/StructuralEquationModels/StructuralEquationModels.jl",
author = "Ernst, Maximilian Stefan and Peikert, Aaron",
title = "StructuralEquationModels.jl: A fast and flexible SEM framework",
year = 2022, # date is omitted
journal = "No Real Journal"
)
url(location)
# output
"https://github.com/StructuralEquationModels/StructuralEquationModels.jl"
Taxonomy.valid_doi
— FunctionValidate DOI via Regex
Regular expression taken from:
https://www.crossref.org/blog/dois-and-matching-regular-expressions/