Warning: file_get_contents(/data/output/bibbi/count.csv): failed to open stream: No such file or directory in /var/www/html/metadata-qa-marc-web/classes/BaseTab.php on line 67

Warning: file_get_contents(/data/output/bibbi/last-update.csv): failed to open stream: No such file or directory in /var/www/html/metadata-qa-marc-web/classes/BaseTab.php on line 77
QA catalogue for analysing library data

QA catalogue for analysing library data

Bibbi-katalogen (fullstendig)     number of records:
Warning: number_format() expects parameter 1 to be float, string given in /var/www/html/metadata-qa-marc-web/libs/_smarty/templates_c/c603c88a952ec973f7cb3be10c750150f43b591a_0.file.header.tpl.php on line 35

Thompson—Traill completeness

These scores are the implementation of the following paper:

Kelly Thompson and Stacie Traill (2017) Implementation of the scoring algorithm described in Leveraging Python to improve ebook metadata selection, ingest, and management, Code4Lib Journal, Issue 38, 2017-10-18. http://journal.code4lib.org/articles/12828

Their approach to calculate the quality of ebook records comming from different data sources.

histogram

  • y axis: number of records
  • x axis: total score of a record
Each record get a score based on a number of criteria. Each criteria results in a positive score. The final score is the summary of these criteria scores.
Record Element MARC field/position/subfield How counted
1. ISBN 020 1 point for each occurrence of field
2. Authors 100, 110, 111 1 point for each occurrence of field(s)
3. Alternative Titles 246 1 point for each occurrence of field
4. Edition 250 1 point for each occurrence of field
5. Contributors 700, 710, 711, 720 1 point for each occurrence of field(s)
6. Series 440, 490, 800, 810, 830 1 point for each occurrence of field(s)
7. Table of Contents and Abstract 505, 520 2 points if both fields exist; 1 point if either field exists
8. Date (MARC 008) 008/7-10 1 point if valid coded date exists
9. Date (MARC 26X) 260$c or 264$c 1 point if 4-digit date exists; 1 point if matches 008 date.
10. LC/NLM Classification 050, 060, 090 1 point if any field exists
11. Subject Headings: Library of Congress 600, 610, 611, 630, 650, 651 second indicator 0 1 point for each field up to 10 total points
12. Subject Headings: MeSH 600, 610, 611, 630, 650, 651 second indicator 2 1 point for each field up to 10 total points
13. Subject Headings: FAST 600, 610, 611, 630, 650, 651 second indicator 7, $2 fast 1 point for each field up to 10 total points
14. Subject Headings: GND
(This was not part of the original algorithm)
600, 610, 611, 630, 650, 651 second indicator 7, $2 fast 1 point for each field up to 10 total points
15. Subject Headings: Other 600, 610, 611, 630, 650, 651, 653 if above criteria are not met 1 point for each field up to 5 total points
16. Description 008/23=o and 300$a “online resource” 2 points if both elements exist; 1 point if either exists
17. Language of Resource 008/35-37 1 point if likely language code exists
18. Country of Publication Code 008/15-17 1 point if likely country code exists
19. Language of Cataloging 040$b 1 point if either no language is specified, or if English is specified
20. Descriptive cataloging standard 040$e 1 point if value is “rda”

components

The histograms of the individual components: