Computer Applications and Quantitative Methods in Archaeology

Fourth meeting of the UK chapter

Cardiff University, Wales, February 27-28 1999

Numismatic Database with Icon and String-Searching Features

 

Leonel Morgado, Mário Guedes

 

GeIRA Project

University of Trás-os-Montes e Alto Douro - Computer Centre

Apt. 202, 5001 Vila Real Codex, Portugal

Phone +351 (59) 320356       Fax: +351 (59) 320480           E-mail: mailto:leonelm@utad.pt

Abstract

The Museum of Vila Real has a prized collection of over 50,000 ancient coins—the result of many years of devoted effort by Father João Parente, who offered the town this cache. Parente (1997) published part of the collection in a tome containing numismatic data for 5.000 coins and photographs of 1100 of these.

N.

Type

OBVERSE

REVERSE

MARK

WEIGHT

AXIS

DIAMETERS

ORIGIN

REFERENCES

NOTES

3RD GROUP
1st officina

2220

Aes 4

D N CONSTAN-TINVS P F AVG

Diademed head with laurels and rosettes, to the right

GLOR-IA EXERC-ITVS

Two soldiers and a standard

   
SMNA

1.074

12 h

15/15.6

Vila Marim

RIC 7;

LRBC 1134

Figure 1: Numismatic data from Father Parente’s book, published in 1997 (translated)

A database of the numismatic records allows automated information processing prior to making details available on the museum Web site. The "MARK" column (from here on called "marks") contains a series of schemas with distinctive letters or scripts that posed a major problem in creating the database. Besides this, although not directly associated with the database, there is a minor problem in presenting several schemas and symbols in an HTML page without manually inserting code for each coin.

Storing the marks in the database

Most marks consist of letters set in a frame. For instance, in figure 1, the string "SMNA" under a horizontal line forms a mark; however, a large number of coins have symbols among the letters (figure 2).

             

Figure 2: Some symbols found in the coins' marks

One of the first approaches was to store the marks as images but there were several drawbacks, as follows: -

·         scanning would be a tedious process, lacking enough similarities to automate the process

·         the book binding would have to be destroyed to get a regular orientation of the marks

·         the extension of each coins' text information (description of faces, notes, etc.) varies greatly, making it difficult to vertically locate a mark on any given page

·         storing the marks as images doesn't allow for text or symbol searches


Analysis of the structure of the marks' schemas showed the symbols were always used as letters and could be replaced by alphanumeric characters. A random sampling of several dozen marks revealed four different schemas, indicating the total number of schemas was low enough to have a specific processing for each type. Figure 3 displays the four types of schema found, the letter "A" representing various strings.

The book contained few non-roman coins. These non-roman coins had no schemas registered in the book (for these, the book contained a single horizontal line, "-", in the "MARK" column). Therefore, we have not tried to deal with the problem of registering non-roman schemas. However, as long as the number of schemas is low enough to allow specific processing for each one, this system can be expanded to accommodate them. This would be quite easy to achieve: imagine a 5th kind of schema, with N strings, each occupying a particular position. A specific processing would be developed for it, which would take care of the organisation and display issues. The string association would be accomplished simply inserting the N strings in the Fields_Marca table. More information is available in the "Database structure" table.

 

 

 

 

 

 

A

A

 

 

 

 

 

 

 

 

A

A

 

 

A

A

 

 

 

A

A

 

 

 

A

A

 

 

 

A

A

 

 

A

A

 

 

 

A

A

 

 

 

A

A

 

 

 

A

A

 

1 upper string;

1 lower string

 

2 upper strings;

1 lower string

 

2 upper strings split by divider;

1 lower string

 

4 upper strings split by divider;

1 lower string

Figure 3: four schemas found in random sample

Each schema has its own organisation and a specific number of alphanumeric strings that is reflected in the database structure, with three tables addressing this issue, as follows: -

·         Moedas (Coins), containing the basic properties of the coins

·         Formato_marca (Mark Format), containing the schemas of the marks' strings

·         Fields_Marca (Mark Fields), containing the strings used in a mark's schema

In terms of relationships, each coin (an item in the Moedas table) is associated with a schema (an item in the Formato_marca table) and with a variable number of strings (items in the Fields_Marca table). This allows the creation of a data entry and query form for marks. As the mark types are limited, the form automatically adapts for each by drawing the divider elements and creating or eliminating fields for string insertion and display. The actual structure of the tables is described further on in this document, in the "Database structure" section.

Figure 4: coin-mark insertion form

At this stage, the issue was limited to the matter of symbols. The typesetter’s files yielded a full set of binary image symbols. There are 96 altogether (laurel wreaths, diamonds, mirrored letters, aggregated letters, etc.). These 96 symbols, the 26 letters of the alphabet and the 10 numeric characters give 132 different items that can be displayed by using single-byte fonts.

The option was the creation of a new font set which included characters found in the coin-marks (dropping ã, Ç, £, §, and others). This uses considerably less storage space than storing images, and allows for fast search of mark components. The data insertion process itself is much faster than any current optical scanning process.

The font is a TrueType vector font, created using the Corel Draw package from the Canadian company Corel. Corel Draw's online help contains full information on how to use it for TrueType font creation. This font is available for use by interested third parties, a simple e-mail request to the author sufficing.

Database operators perform the mark definition easily because the coin-mark insertion form includes an active table with all available symbols. An operator just has to use the keyboard for normal characters and make mouse clicks on appropriate icons for symbols (figure 4).

A similar form can be used to define searches on the database for instances of a particular symbol. Please see this document's "Database querying and longevity" section for more details.

Regarding future expansion of the method, should a particular situation demand more than the 256 characters available on single-byte fonts, it is technically possible to convert the system to Unicode two-byte fonts. This kind of font encoding allows for 65636 characters per font, and is commonly used in the computer science field for encoding oriental languages. Of course, the interface would also have to be redesigned to allow useful organisation of such a vast collection of symbols. But these issues go beyond the aim of the current paper.

Database structure

Figure 5: the database structure

The database was designed not as a definitive concept, but rather as a support for the computer archive of the book's information, in a way that would allow its further refinement and processing, or import into specialised databases.

For this purpose, the database design was normalised in the Boyce-Codd normal form. Information about this kind of normalisation is readily available in most college manuals on database design. Gupta (1996) authored a site with normalisation information, including the Boyce-Codd normal form. Its URL can be found in the "References" section. The resulting database organisation is presented in figure 5.

As an example, no special codes or text are used to indicate that a particular coin is a denarius or some other kind of Roman coin. A table ("Tipos") holds a list of types, including the denarius, and assigns each a unique numerical identifier, that is used as the key value in the relationship between the coins table ("Moedas") and the types table. Therefore, assume you need to convert a coin to another database: looking at the "Moeda" table, you get a type value of "1". Checking the table "Tipos", you get the textual information (table 1).

ID

Description

1

Denarius

...

...

Table 1: snapshot of the "Tipos" table.

This immediately tells you that all coins labelled with type "1" are denarius. Therefore, a conversion table, of the look-up kind, is easily constructed. Common SQL queries can then be used to convert the database to any other normalised format. Even in the case of a format that falls short of Boyce-Codd normalisation, conversion is possible, by means of developing a program to perform the conversion; what matters is that the information structure allows these conversions to be performed automatically, with no need for human intervention at the record level.

On another perspective, should the need arise to associate this mark-storage method to another database as Terence Volk's (1994-95), for example, it can easily be done: all that is required is to create a matching table between the original database and the mark-storage method tables. Terence Volk's database structure is a case of a database that is not normalised, and therefore would require programming to convert this paper's database information into it. This would be centred in the generation of the identification strings used by Volk. Lookup tables and conversion procedures can be constructed to convert the numeric codes used here to the substrings used in Volk's identification strings, such as "ROMN" for Roman coins, "rpb" for republic, and so forth. An interesting project would be, on the other hand, converting Volk's database to a normalised format. This is, however, beyond the scope of this paper.

The tables that address the mark-schema storage issue, being a novelty, require further explanation. The following tables (table 2 and table 3) present their internal structure:

Fields_marca

FIELD NAME

TYPE

DESCRIPTION

MarcaFields

Numeric

Unique ID field.

TextFields

Text

Text string to be displayed.

FieldsPos

Numeric

Ordinal position of the field within a schema.

FormType

Numeric

Schema in which this field is used.

IDMoeda

Numeric

ID of the coin this field

Table 2: internal structure of the "Fields_marca" table.


 


Formato_marca

FIELD NAME

TYPE

DESCRIPTION

FormatoID

Numeric

Unique ID field.

FormDesc

Text

Text description of the schema.

FormFields

Numeric

Number of fields within the schema.

FormImg

OLE Object / Binary Data

Image associated with the schema (for background in visual forms, also useful as an extra description, etc.).

Table 3: internal structure of the "Formato_marca" table.

To clarify the usage of these tables, let us resort to an example, the representation of a fictitious mark (figure 6).

 

 

 

 

CM

B

 

AMD

 

Figure 6: fictitious mark used in the example.

In this example, there are 3 strings in the mark. These will have three entries in the Fields_marca table (table 4).

MarcaFields

TextFields

FieldsPos

FormType

IDMoeda

1

CM

1

3

5436

2

B

2

3

5436

3

AMD

3

3

5436

...

...

...

...

...

Table 4: the "Fields_marca" entries for the example.

In this example, the FormType column indicates that the string "1" has the value "CM", is in the first position of the schema number 3 and belongs to coin number 5436. The same kind of reading can be applied to strings "2" and "3".

The schema 3 would be represented as shown on table 5.


 


FormatoID

FormDesc

FormFields

FormImg

...

...

...

...

3

2 upper strings split by divider;

1 lower string

3

...

...

...

...

Table 5: the "Formato_marca" entry for the example.

There is a technical issue here, regarding the storage of images in databases. Our database was developed in Microsoft Access 97, so we used its internal "OLE Object" field. However, for our Web pages we use a more powerful system, Microsoft SQL Server, which does not possess that data type. But since what we need to display an image is the binary data that composes it, that is exactly what we store in the SQL Server database: the bytes that compose a GIF image file. This image format is one of the most used in the world, being supported by all graphical Web browsers and most image-editing software.

Another simple option would be the storage of filenames. We did not follow this option, however, since keeping the entire information in the database provides more reliability regarding its longevity: files detached from the main database can easily be corrupted, lost, etc.

Database querying and longevity

Since the database is normalised, it can be queried using standard SQL (Structured Query Language) commands (ISO 1992). This allows from simple to complex queries to be created. The most practical way of using them is to define graphical interfaces, such as forms in Microsoft Access 97 or Web-based forms, that execute some specific kind of query.

For instance, if one wants to retrieve all coins struck by the second officina of the mint of Milan, an interface can be developed to behave in the following fashion:

Let us assume that Milan has ID 4 in the mint table ("Ceca"). In this case, the coins from Milan have the number "4" in the CecaID field. Therefore, this query would retrieve the ID numbers of the desired coins (bold lettering indicates SQL commands):

SELECT MoedaID FROM moedas WHERE CecaID=4 AND Officina=2;

Of course, this would be best performed through means of a form. The form would contain a list box or a clickable map, to allow the user to choose the mint; and an edit box, to allow the user to enter the officina number. The form code would then generate the SQL command string above, and query the various database tables to present the desired information (i.e., the emperor name, not its code, and the visual mark, not its coding).

Now, suppose that you want to find all the instances of coins with a particular symbol. By using a form such as the one in figure 4, it's easy to select the desired symbol. Since a character in the font represents that symbol, say "§", a SQL query can easily retrieve the desired information:

SELECT DISTINCT IDMoeda FROM Fields_marca WHERE TextFields LIKE '%§%';

This query would retrieve all ID numbers of the coins in whose marks the symbol represented by "§" turns up. Notice that the retrieved field is IDMoeda, from the Fields_marca table (which is the table being queried).

The construction of this kind of queries is extremely simple in database interfaces such as Microsoft's Access 97 and other visual interface, since the query is constructed visually in an intuitive fashion.

Regarding data longevity, all the fields are readily imported into most modern databases, since they use only common data formats: the alphanumeric text fields are short, all numbers are integers, etc.

The only real issue is posed by the storage of schema's images in the Formato_marca table. No standard data format exists for images, each software vendor providing a different alternative. However, most databases allow raw binary data to be stored; this allows storage of images as GIF or JPEG files, storing the files themselves in the database. This allows the data to be, in the most extreme case, extracted from the database and placed in a file, for usage in some other fashion.

Another issue is the risk of losing the actual font, which, in fact, is the only element connecting a symbol to the character that is stored in the strings of the Fields_marca table.

We have not addressed this problem, but some possible approaches have occurred to us; these may in the future yield a solution:

·         Creation of a table associating text characters with descriptions of the symbols they represent and raw binary data images of each.

·         Storage of one or several fonts in a database table. This would allow different single-byte fonts to be used for different sets of coins or marks. An extra column could be added to the Fields_marca table, associating a field with the font that should be used to display it.

Showing it on the Web

The special font would not be suited for the Web because different systems have diverse font font-support capabilities. Apart from this, new trends in TV set top boxes may result in millions of new Web users that have limited or no font-downloading capabilities at all. As the symbols are monochrome and quite small, GIF images of each would download rapidly. Thus, this was the solution.

The server-side scripting used for the database queries detects when a particular character requires an image and includes it where necessary in the HTML code. The Museum’s Web site uses Active Server Page scripts but this technique can be employed with any web-scripting technology (CGI, Perl, etc.).

Acknowledgements

This project is sponsored by the EU programmes Interreg II (02/REGII/6/96), Feder and, the FCT (Foundation for Science and Technology).

References

GUPTA, Gopal. 1996. Normalisation, James Cook University of North Queensland, Australia (http://www.cs.jcu.edu.au/ftp/web/teaching/Subjects/cp1500/1997/Lecture_Notes/normalisation/contents.html).

ISO - International Organization for Standardization, 1992. Information technology -- Database languages -- SQL, ISO/IEC 9075:1992, Geneva, Switzerland.

PARENTE, João. 1997. Museu de Vila Real - Moedas - Tomo I, Câmara Municipal de Vila Real, Vila Real, Portugal.

VOLK, Terence, 1995. Retroconversion and the numerical analysis of Roman republican coin-hoards, part I, Rivista Italiana di Numismatica Vol. XCVI, 1995-95, pp. 105-186, Società Numismatica Italiana, Milan, Italy