Create a new parser

We would like to parse BibJSON from many sources, but there is a limit to how many we can write. However, it is possible to write your own parser for use in your own installation of BibServer, or to submit it to us for inclusion in our software repository.

All the BibServer parsers can be found in bibserver/parsers in our repo – . The exemplar parser that we have been developing from is [](, and the structure it follows can be used to build more parsers:

## start your parser script

Begin your script with any required imports, a sensible class name, and an init function:

import string

class MyFormatParser(object):

    def __init__(self):
        '''do stuff required to initialise the parser'''

## the parse method

Your parser must have a parse method that can be called, and it should expect to receive a file object to iterate through. This method must return a list of BibJSON objects, each one representing a record in your collection.

    def parse(self, fileobj):
        '''given a fileobject, parse it for bibtex records,
        and pass them to the record parser'''
        records = []
        record = ""
        for line in fileobj:
            '''do whatever must be done to get stuff out of your
            format and into BibJSON'''

        return records

## parse_record

Our bibtex parser actually calls a parse_record method from the parse method. So, the parse method parses the file object and pulls out records, then passes each record to the parse_record method. This is not mandatory, but you may find it useful – and it provides a way to pass an individual record in your format into your parser for parsing, instead of a file object.

## more methods

All the rest of the methods in our bibtex parser are just there to be used by the parse_record method. They do things like sanitising the input data, and changing the format of particular keys in bibtex to meet the requirements of BibJSON. We also end with a very long list of conversions from latex to unicode – this is because latex defines ways to represent certain letters in latex / bibtex files, but we need to be able to get them into unicode utf-8 for BibJSON. You may need to do something similar if your format contains custom representations such as this.

## write a test

You should test your new parser by instantiating it and passing it an example file, then checking the output is as expected. See [our tests in our repo]( for examples.