Taxonomy

What?

Taxonomy.jl aims to serve as a comprehensive database of structural equation models (SEMs) that can be used to infer distributions of both structures (e.g., types of models, numbers of observed and latent variables) and parameters (e.g., what is the average factor loading). This will greatly facilitate simulation studies that accurately reflect real-world conditions and therefore takes the idea that "Simulation studies are to a statistician what experiments are to a scientist" Pawel & Kook et al. seriously. Having a common basis for setting parameters in simulations will also reduce the extremely wide latitude that statisticians have to create an overly positive image of the strengths of novel methods. So-called researchers-degrees of freedom are already a concern in empirical studies but simulation studies exaggerate the issue by allowing an almost infinite freedom over the data-generating process. Additionally, Taxonomy.jl will provide a user-friendly interface for researchers to easily sample these parameters for use in their own simulation studies.

End product

A Julia package that enables filtering a taxonomy database and construct samplers for structure and parameters. These samplers can quickly be turned into models for StructuralEquationModels.jl.

So what?

Simulations are only helpful in so far as they reflect some (simplified) aspects of reality. A simulation, that is only based on the guess of the researchers conducting it, is prone to fail to represent realistic conditions and may even favour the procedures that are investigated unduly. Being able to base simulations on a sample of the literature strengthens the inference and practical impact of simulation studies.

Impact on scientific community

The Julia package will provide an according database and interface, and therefore lower the threshold for the conduction of (better) simulation studies. By this, it may enable more general claims and facilitate preregistration of simulations. Furthermore, Bayesian methodology has highlighted the importance of incorporating prior knowledge in the form of prior distributions. Taxonomy.jl seeks to make this process more transparent and accessible for researchers and consumers of science, which can greatly facilitate cumulative science. Besides enabling better simulations, knowing how common different types of SEMs are may greatly help guiding the development of new methodologies.

Index

Data Base

Taxonomy.RecordDatabaseType

RecordDatabase

Records need to be stored somewhere.

julia> RecordDatabase()
RecordDatabase{Base.UUID, Record}()

julia> first = Record(rater = "AP", id = "552ef675-5c7b-4ce1-880b-c45b833fdfcb", location = NoLocation(), meta = MetaData(missing, missing, missing));

julia> second = Record(rater = "AP", id = "58c55701-0362-40c7-849c-5d12e5026238", location = NoLocation(), meta = MetaData(missing, missing, missing));

julia> rd = RecordDatabase(first, second)
RecordDatabase{Base.UUID, Record} with 2 entries:
  UUID("58c55701-0362-40c7… => Record…
  UUID("552ef675-5c7b-4ce1… => Record…

julia> rd += Record(id="921c777a-0cc6-44da-a444-e610bfacbb07", rater="AP", location=NoLocation(), meta = MetaData(missing, missing, missing))
RecordDatabase{Base.UUID, Record} with 3 entries:
  UUID("921c777a-0cc6-44da… => Record…
  UUID("58c55701-0362-40c7… => Record…
  UUID("552ef675-5c7b-4ce1… => Record…
source

Taxons

using Taxonomy, AbstractTrees
AbstractTrees.children(d::DataType) = subtypes(d)
print_tree(Taxonomy.Taxon)
Taxon
├─ AbstractCFA
│  └─ Measurement
├─ AbstractCLPM
├─ AbstractLGCM
│  └─ SimpleLGCM
├─ AbstractPathmodel
│  └─ Structural
└─ NoAbstractTaxon
   └─ NoTaxon

CFA

Taxonomy.MeasurementType

Measurement AbstractCFA. Building Block for Taxonomy. Multiple Measurements can be combined to a Taxon.

Arguments

  • n_variables: Number of variables (possibly observed/manifest). If items are parceled, this is the number of parcels.
  • loadings: Vector of loadings, one for each item. If both standardized and unstandardized loadings are reported, code standardized.
  • factor_variance: Variance of the factor.
  • error_variances: Vector of variances of the respective errors
  • error_covariances_within: Vector of covariances within factor. If unknown, set to missing, if there are no covariances, set to Float64[].
  • error_covariances_between: Vector of covariances the factor shares with a different factor. If unknown, set to missing, if there are no covariances, set to Float64[].
  • crossloadings_incoming: Vector of crossloadings coming from other factors. They should be lower than the loading coming to the item from this factor. If unknown, set to missing, if there are none, set to Float64[].
  • crossloadings_outgoing: Vector of crossloadings going to other items which have higher loadings from other factors. If unknown, set to missing, if there are none, set to Float64[].
  • quest_scale: Scale of the questionnaire. Anything more than ten is Inf. E.g: 5 point likert scale -> 5.
Measurement(n_variables = 2, loadings = [1, 0.4], factor_variance = 0.6, quest_scale = 5)

# output

Measurement
   n_variables: JudgementInt{Int64}
   loadings: JudgementVecNumber{Vector{Float64}}
   factor_variance: JudgementNumber{Float64}
   error_variances: JudgementVecNumber{Missing}
   error_covariances_within: JudgementVecNumber{Missing}
   error_covariances_between: JudgementVecNumber{Missing}
   crossloadings_incoming: JudgementVecNumber{Missing}
   crossloadings_outgoing: JudgementVecNumber{Missing}
   quest_scale: JudgementNumber{Int64}
source

Pathmodels

Taxonomy.StructuralType

Structural AbstractPathmodel. Consists of a graph from StenoGraphs (structural model).

Arguments

  • structural_model: Graph from StenoGraphs package. Defines the latent relations between the factors of measurement_model.
using StenoGraphs

graph = @StenoGraph begin
    # latent regressions
    fac1 → fac2
end

Structural(structural_model = graph)
source
Taxonomy.LatentPathmodelType

Create a new LatentPathmodel instance.

for indexing: mymodel.structuralmodel.structuralmodel mymodel.measurement_model[:fac2]

using StenoGraphs
graph = @StenoGraph begin
    # latent regressions
    fac1 → fac2
end

my_model = LatentPathmodel(
    Structural(structural_model = graph),
    Dict(
        :fac1 => Measurement(n_variables = 2, loadings = [1, 0.4], factor_variance = 0.6),
        :fac2 => Measurement(n_variables = 2, loadings = [1, 0.4], factor_variance = 0.6)
    )
)

# output

LatentPathmodel
   structural_model: Structural
   measurement_model: Dict{Symbol, Measurement}
source

Cross Lagged Panel Model

Linear Growth Curve Model

Taxonomy.SimpleLGCMType

SimpleLGCM AbstractLGCM. Taxon for Linear Growth Curve Model.

## Arguments
  • n_timepoints: Number of measurement timepoints.
  • timecoding: Vector containing the coding of the measurement time points (loadings of the slope onto the timepoints).
  • intercept: Intercept constant.
  • slope: Slope constant.
  • nonlinear_timecoding: Vector for the timecodings introduced by a nonlinear function.
  • variance_intercept: Variance of the intercept.
  • variance_slope: Variance of the slope.
  • covariance_intercept_slope: Covariance between intercept and slope.
  • variances_timepoints: Vector with variances of the timepoint variables.
  • n_predictors: Number of predictors on intercept and slope.
  • predictor_paths_intercept: Vector for the predictor-paths to the intercept.
  • predictor_paths_slope: Vector for the predictor-paths to the slope.
SimpleLGCM(n_timepoints = 6, timecoding = [0, 1, 2, 3, 4, 5], intercept = 10.2, 
slope = 0.96, nonlinear_timecoding = [1, 2, 4, 9, 16, 25], variance_intercept = 1, variance_slope = 1, covariance_intercept_slope = 0.1,
n_predictors = 2, predictor_paths_intercept = [2, 4], predictor_paths_slope = [3, 5])

# output
SimpleLGCM
   n_timepoints: JudgementInt{Int64}
   timecoding: JudgementVecNumber{Vector{Int64}}
   intercept: JudgementNumber{Float64}
   slope: JudgementNumber{Float64}
   nonlinear_timecoding: JudgementVecNumber{Vector{Int64}}
   variance_intercept: JudgementNumber{Int64}
   variance_slope: JudgementNumber{Int64}
   covariance_intercept_slope: JudgementNumber{Float64}
   variances_timepoints: JudgementNumber{Missing}
   n_predictors: JudgementInt{Int64}
   predictor_paths_intercept: JudgementVecNumber{Vector{Int64}}
   predictor_paths_slope: JudgementVecNumber{Vector{Int64}}
source

No Taxon

Taxonomy.NoTaxonEverType

NoTaxonEver()

A Taxon to show, that there is no SEM to code in this paper.

julia> NoTaxonEver()
NoTaxonEver

julia> NoTaxon()
NoTaxonEver
source
Taxonomy.NoTaxonYetType
NoTaxonYet()

A Taxon to show, that there is in fact a model to be coded, but this is at the current point not possible. NoTaxonYet requires a timestamp 'accessdate' to further specify at what point there were no possibility to code the respective model. NoTaxonYet gives you the option to name the 'modeltype', you were not able to code, to spare your future self the work of going everything through.

julia> NoTaxonYet()

source

Extractors

Taxonomy.structural_modelFunction

Function to extract the StenoGraphs structural model from [Structural].

Arguments

Return

Returns a Judgement

using Taxonomy
using StenoGraphs

graph = @StenoGraph begin
    # latent regressions
    fac1 → fac2
end

struct_model = Structural(structural_model = graph)

structural_model(struct_model)
source

Extract Structural part from a Taxon.

source

Levels

Taxonomy.RecordType
Record(j...; rater = missing, id = missing, location = missing, meta = missing)

Record represents every paper(like) thing that is coded. It contains who is rating it, what is being rated (uniquely identified by an id), where to find it (location) and other metadata (usually automatically infered). If id is missing, a warning will be generated and an ID will be suggested. This ID is linked to the location (either a DOI or URL). If no location is provided, the ID will be generated at random. You can use this suggested ID or generate your own using generate_id(DOI("yourdoi")) or generate_id(url("yoururl")).

source
Taxonomy.StudyType
Study()

Use this function to group together multiple Judgements and/or Taxons on the Study level.

Return: Output value will be a dictionary containing StudyJudgements and/or Taxons.

source
Taxonomy.ModelType
Model()

Use this function to group together multiple Judgements and/or Taxons on the Model level.

Return: Output value will be a dictionary containing ModelJudgements and/or Taxons.

source

ID

Taxonomy.generate_idFunction

Generate an Entry ID

To create links between entries we need a stable reference point. This ID is generated initially from url(location) and if the url is missing, it is generated randomly. After the ID is generated once, it is saved with the Record and should not be changed.

source

Judgement

Taxonomy.Judgements.JudgementType
Judgement(r::Union{<: Any, Missing}, c = 1.0, l = missing})

Level: Taxonomy.Judgements.AnyLevelJudgement

A generic judgment without any checks on content.

Arguments

  • rating: The rating, e.g. "Structural" or 1.0.
  • certainty: If uncertain, a number between 0.0 and 1.0 (0-100%)
  • comment: information on why the judgement was made, may contain information about the source within the paper, e.g., section, page, table number, figure number.
julia> Judgement(1.0, .99, "Figure 1");
source
Taxonomy.Judgements.NoJudgementFunction

Abstaining from any judgement.

This implies that your best guess is missing and you are absolutely uncertain about this judgement.

julia> NoJudgement()
Judgement{Missing}(missing, 0.0, missing)
source
Taxonomy.Judgements.ratingFunction

Extract rating from Judgement.

If rating is called on a Judgement it returns the rating, on everything it returns identity. If rating is called on a JudgementLevel together with a field name, it returns the rating of that field.

source
Taxonomy.Judgements.certaintyFunction

Extract certainty from Judgement.

If certainty is called on a Judgement it returns the certainty, on everything it returns identity. If certainty is called on a JudgementLevel together with a field name, it returns the certainty of that field.

source

Metadata

Taxonomy.MetaDataFunction

Save metadata.

Can be from complete minimal metadata, incomplete metadata or preferably from DOI.

julia> min = MetaData("Peikert, Aaron", 2022, "Journal of Statistical Software");

julia> incomplete = MetaData("Peikert, Aaron", 2022, missing);

julia> extensive = MetaData(DOI("10.5281/zenodo.6719627"));
source
Taxonomy.apaFunction

Get an APA citation.

julia> apa(DOI("10.5281/zenodo.6719627"))
"Ernst, M. S., &amp; Peikert, A. (2022). <i>StructuralEquationModels.jl</i> (Version v0.1.0) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.6719627"
source
Taxonomy.jsonFunction

Get a Citeproc JSON.

CSL JSON Documentation

CSL JSON can be read by Zotero and automatically generated by doi.org from DOI. All availible information are included and saved in a Dict.

julia> json(DOI("10.5281/zenodo.6719627"))
Dict{String, Any} with 11 entries:
  "publisher" => "Zenodo"
  "issued"    => Dict{String, Any}("date-parts"=>Any[Any[2022, 6, 24]])
  "author"    => Any[Dict{String, Any}("family"=>"Ernst", "given"=>"Maximilian …
  "id"        => "https://doi.org/10.5281/zenodo.6719627"
  "copyright" => "MIT License"
  "version"   => "v0.1.0"
  "DOI"       => "10.5281/ZENODO.6719627"
  "URL"       => "https://zenodo.org/record/6719627"
  "title"     => "StructuralEquationModels.jl"
  "abstract"  => "StructuralEquationModels v0.1.0 This is a package for Structu…
  "type"      => "book"
source
Taxonomy.authorFunction

Extract the author.

julia> doi = MetaData(DOI("10.1126/SCIENCE.169.3946.635"));

julia> author(doi)
"Frank, Henry S."
source
Taxonomy.yearFunction

Extract the year.

julia> doi = MetaData(DOI("10.1126/SCIENCE.169.3946.635"));

julia> year(doi)
1970
source
Taxonomy.journalFunction

Extract the journal.

julia> doi = MetaData(DOI("10.1126/SCIENCE.169.3946.635"));

julia> journal(doi)
"Science"
source
Taxonomy.MinimalMetaType

A representation of the most important metadata.

julia> min = MetaData("Peikert, Aaron", 2022, "Journal of Statistical Software");

julia> typeof(min)
MinimalMeta
source
Taxonomy.IncompleteMetaType

A representation of Metadata when we can not even capture the most important metadata.

julia> incomplete = MetaData(missing, 2022, "Journal of Statistical Software");

julia> typeof(incomplete)
IncompleteMeta
source
Taxonomy.ExtensiveMetaType

The metadata we can gather from doi.org.

julia> doi = MetaData(DOI("10.1126/SCIENCE.169.3946.635"));

julia> typeof(doi)
ExtensiveMeta{MinimalMeta}
source

DOI

Taxonomy.UsualDOIType

Construct a validated DOI

Most valid DOIs (not all) can be simply validated via a regular expression.

Arguments

  • doi::String: a DOI without resolver (e.g. without doi.org), capitalization does not matter
  • fallback::String: an optional fallback link where one maybe can find the content in case the doi fails
julia> DOI("10.5281/zenodo.6719627")
UsualDOI{String, Missing}("10.5281/ZENODO.6719627", missing)

julia> DOI("10.5281/zenodo.6719627", "https://github.com/StructuralEquationModels/StructuralEquationModels.jl")
UsualDOI{String, String}("10.5281/ZENODO.6719627", "https://github.com/StructuralEquationModels/StructuralEquationModels.jl")
source
Taxonomy.UnusualDOIType

Construct an unvalidated DOI

You should prefer an validated UsualDOI but if you have tested the DOI and are sure it links were it supposed to link, go ahead and create an unvalidated doi.

julia> UnusualDOI("weird10.5281doi/zenodo.6719627")
UnusualDOI{String, Missing}("WEIRD10.5281DOI/ZENODO.6719627", missing)

julia> UnusualDOI("weird10.5281doi/zenodo.6719627", "https://github.com/StructuralEquationModels/StructuralEquationModels.jl")
UnusualDOI{String, String}("WEIRD10.5281DOI/ZENODO.6719627", "https://github.com/StructuralEquationModels/StructuralEquationModels.jl")
source
Taxonomy.NoDOIType

What to do if there is no doi

Last resort if there is no DOI. Than we save other metadata, similar to BibTex.

Arguments

  • url::String: an link where one maybe can find the content in case the doi fails
  • author::String: like in BibTex, e.g. "Peikert, Aaron and Ernst, Maximilian S. and Bode, Clifford"
  • date::Union{Date, Missing}: optional date
  • year::Union{Int64}: optional if date is supplied
  • journal::String: The outlet of the publication
  • other::Dict: more BibTexlike metadata
NoDOI(
    url = "https://github.com/StructuralEquationModels/StructuralEquationModels.jl",
    author = "Ernst, Maximilian Stefan and Peikert, Aaron",
    title = "StructuralEquationModels.jl: A fast and flexible SEM framework",
    date = Date("2022-06-24"), # year is inferred
    journal = "No Real Journal",
    awesome = "Yes", # other metadata
    software = "naturally", # some more metadata
    citations = 500
)
NoDOI(
    url = "https://github.com/StructuralEquationModels/StructuralEquationModels.jl",
    author = "Ernst, Maximilian Stefan and Peikert, Aaron",
    title = "StructuralEquationModels.jl: A fast and flexible SEM framework",
    year = 2022, # date is omitted
    journal = "No Real Journal"
)
source
Taxonomy.urlFunction

Get URL from location.

julia> url(DOI("10.1126/SCIENCE.169.3946.635"))
"https://doi.org/10.1126/SCIENCE.169.3946.635"
location = NoDOI(
    url = "https://github.com/StructuralEquationModels/StructuralEquationModels.jl",
    author = "Ernst, Maximilian Stefan and Peikert, Aaron",
    title = "StructuralEquationModels.jl: A fast and flexible SEM framework",
    year = 2022, # date is omitted
    journal = "No Real Journal"
)

url(location)

# output

"https://github.com/StructuralEquationModels/StructuralEquationModels.jl"
source
Taxonomy.valid_doiFunction

Validate DOI via Regex

Regular expression taken from:

https://www.crossref.org/blog/dois-and-matching-regular-expressions/

source