| EBook-Tools documentation | view source | Contained in the EBook-Tools distribution. |
new($filename)load($filename)load_resdir($dirname)author()bookproplength()filecount()find_image_type($id,@excluded)find_resource_by_name($name)image($type,$id)image_hashref($type,$id)image_ids($type)is_1150()offsetelement($offset)pack_imp_book_properties()pack_imp_header()pack_imp_resource(%args)
pack_imp_rsrc_inf()pack_imp_toc()resdirbase()resdirlength()resdirname()resource($type)resources()text()title()tocentry($index)version()write_images(%args)
write_imp($filename)write_resdir()write_text(%args)
create_toc_from_resources()parse_eti_server_data($data)pad.unknown1issuenumber.contentfeed.'SOURCE_ID:SOURCE_TYPE:None', where SOURCE_ID is usually
'3' and SOURCE_TYPE is usually 'B'.unknown2. This value may not be present at all.
parse_imp_book_properties($propdata)parse_imp_header()
parse_resource_cm()parse_resource_images()parse_resource_imrn()parse_text()parse_imp_toc_v1($tocdata)
parse_imp_toc_v2($tocdata)
set_book_properties(%args)
EBook::Tools::IMP - Object class for manipulating the SoftBook/GEB/REB/eBookWise .IMP and .RES e-book formats
use EBook::Tools::IMP qw(:all)
my $imp = EBook::Tools::IMP->new();
$imp->load('myfile.imp');
new($filename)Instantiates a new EBook::Tools::IMP object. If $filename is
specified, it will also immediately initialize itself via the load
method.
load($filename)Loads a .imp file, parsing it into the various object attributes. Returns 1 on success, or undef on failure.
load_resdir($dirname)Loads a .RES resource directory, parsing it into the object
attributes. Returns 1 on success, or undef on failure.
author()bookproplength()Returns the total length in bytes of the book properties data, including the trailing null used to pack the C-style strings, but excluding any ETI server data appended to the end of the standard book properties.
filecount()Returns the number of resource files as stored in
$self->{filecount}. Note that this does NOT recompute that value
from the actual number of resources in $self->{resources}. To do
that, use create_toc_from_resources().
find_image_type($id,@excluded)Goes through all stored images searching for one with the specified id
value, returning the first image type found or undef if there were no
matches or if no image id was specified. If the optional argument
@excluded is specified, any types in the list will be skipped
during the search.
Expected types are 'png', 'jpg', 'gif', and 'pic', searched for in that order.
This can be used to attempt to locate an alternate image for an undisplayable PICT image.
find_resource_by_name($name)Takes as a single argument a resource name and if a resource with that
name exists in $self->{resources} returns the resource type
used as the hash key.
Returns undef if no match was found or a name was not specified.
image($type,$id)Returns the image data stored in the resource of the specified type
(specifically, stored in $self->{$type}->{$id}->{data} as
parsed from the JPEG resource) corresponding to the 16-bit identifier
provided as $id.
Valid values for $type are 'gif','jpg', and 'png'.
Carps a warning and returns undef if $type is not provided or is
not valid, or if $id is not provided.
image_hashref($type,$id)Returns the raw object hashref used to store parsed image data for the
specified type, as stored in $self->{$type}. Valid types are
'gif', 'jpg', and 'png'.
Carps a warning and returns undef if $type is not provided or is
not valid.
If $id is not specified, the keys of the returned hash are the
image IDs for the specified image type, and the values are hashrefs
pointing to hashes containing the following keys:
unknownA 16-bit integer only available on EBW 1150 resources. Use with caution. This key may be renamed if more information is found.
lengthThe length of the actual image data
offsetThe byte offset inside of the raw resource data in which the JPEG image data can be found.
const0An unknown value, but it appears to always be zero. Use with caution. This key may be renamed if more information is found.
If the optional argument $id is specified, only the hash for that
specific ID is returned, rather than the entire hash of hashrefs.
image_ids($type)Returns a list of the 16-bit integer IDs of the the specified type of
image data stored in the associated resource (specifically, stored in
$self->{$type} as parsed from the JPEG resource).
Valid types are 'gif', 'jpg', and 'png'. The method will carp a warning and return undef if another type is specified, or no type is specified.
is_1150()Returns 1 if $self->{device} == 2, returns 0 if it is some
other value, and undef it is undefined. This has value because
resources packed for a EBW 1150 or GEB 1150 are in a different format
than resources packed for other IMP readers.
offsetelement($offset)Returns the text of the element corresponding to the given text offset
as stored in $self->{offsetelements}, or undef if no such
element exists.
pack_imp_book_properties()Packs object attributes into the 7 null-terminated strings that constitute the book properties section of the header. Returns that string.
Note that this does NOT pack the ETI server data appended to this section in encrypted books downloaded directly from the ETI servers, even if that data was found when the .imp file was loaded. This is because the extra data can confuse the GEBLibrarian application, and is not needed to read the book. The bookproplength() and pack_imp_header() methods also assume that this data will not be present.
pack_imp_header()Packs object attributes into the 48-byte string representing the IMP header. Returns that string on success, carps a warning and returns undef if a required attribute did not contain valid data.
Note that in the case of an encrypted e-book with ETI server data in it, this header will not be identical to the original -- the resdiroffset value is recalculated for the position with the ETI server data stripped. See bookproplength() and pack_imp_book_properties().
pack_imp_resource(%args)Packs the specified resource stored in $self->{resources} into
a a data string suitable for writing into a .imp file, with a header
format determined by $self->{version}.
Returns a reference to that string if the resource was found, or undef it was not.
nameSelect the resource by resource name.
If both this and type are specified, the type is checked first and
the name is only used if the type lookup fails.
typeSelect the resource by resource type. This is faster than selecting by name (since resources are stored in a hash keyed by type) and is recommended for most use.
If both this and name are specified, the type is checked first and
the name is only used if the type lookup fails.
pack_imp_rsrc_inf()Packs object attributes into the data string that would be the content of the RSRC.INF file. Returns that string.
pack_imp_toc()Packs the $self->{toc} object attribute into a data string
suitable for writing into a .imp file. The format is determined by
$self->{version}.
Returns that string, or undef if valid version or TOC data is not found.
resdirbase()In scalar context, this returns the basename of $self->{resdirname}.
In list context, it actually returns the basename, directory, and
extension as per fileparse from File::Basename.
resdirlength()Returns the length of the .RES directory name as stored in
$self->{resdirlength}. Note that this does NOT recompute the
length from the actual name stored in $self->{resdirname} --
for that, use set_resdirlength().
resdirname()Returns the .RES directory name stored in $self->{resdirname}.
resource($type)Returns a hashref containing the resource data for the specified
resource type, as stored in $self->{resources}->{$type}.
Returns undef if $type is not specified, or if the specified type
is not found.
resources()Returns a hashref of hashrefs containing all of the resource data
keyed by type, as stored in $self->{resources}.
text()Returns the uncompressed text originally stored in the DATA.FRK
(' ') resource. This will only work if the text was unencrypted.
title()Returns the book title as stored in $self->{title}.
tocentry($index)Takes as a single argument an integer index to the table of contents
data stored in $self->{toc}. Returns the hashref corresponding
to that TOC entry, if it exists, or undef otherwise.
version()Returns the version of the IMP format used to determine TOC and
resource metadata size as stored in $self->{version}. Expected
values are 1 (10-byte metadata) and 2 (20-byte metadata).
write_images(%args)Writes the images, if any, to the specified output directory.
Filenames are in the format JPEG_XXXX.jpg or PNG_XXXX.png where
XXXX is the image ID for that image type formatted as four
hexadecimal characters.
dirThe output directory in which to write the file. This will be created if it does not exist. Defaults to the basename of the stored resource directory (see also resdirname()).
write_imp($filename)Takes as a sole argument the name of a file to write to, and writes a .imp file to that filename using the object attribute data.
Returns 1 on success, or undef if required data (including the filename) was invalid or missing, or the file could not be written.
write_resdir()Writes a .RES resource directory from the object attribute data,
using $self->{resdirname} as the directory name.
write_text(%args)Writes the uncompressed text, if any, to the specified output directory and file.
dirThe output directory in which to write the file. This will be created if it does not exist. Defaults to the basename of the stored resource directory (see also resdirname()).
filenameThe filename of the output file to write. If not specified, a warning will be carped and the method will return undef.
create_toc_from_resources()Creates appropriate table of contents data from the metadata in
$self->{resources}, in the format specified by
$self->{version}. This will also set $self->{filecount}
to match the actual number of resources.
Returns the number of resources found.
parse_eti_server_data($data)Parses ETI server data, as potentially found appended to the end of .imp book properties or a RSRC.INF resource file on encrypted books downloaded directly from ETI servers.
Takes as a single argument a string containing just the extra appended
data, and stores the parsed values in $self->{etiserverdata} as
a hash. Note that parsing requires knowledge of the length of the
book properties at the time this data was inserted; if the book
properties have not been properly parsed or have been modified, the
resulting behaviour of this method is not defined.
Returns the number of bytes handled, zero if no data was provided.
The data has the following format and keys:
pad.unknown1issuenumber.contentfeed.'SOURCE_ID:SOURCE_TYPE:None', where SOURCE_ID is usually
'3' and SOURCE_TYPE is usually 'B'.unknown2. This value may not be present at all.parse_imp_book_properties($propdata)Takes as a single argument a string containing the book properties data. Sets the object variables from its contents, which should be seven null-terminated strings in the following order:
Note that the entire name is frequently placed into the "First Name" component, and the "Last Name" and "Middle Name" components are left blank.
In addition, ETI server data may be appended to this data on encrypted
books downloaded from ETI servers. If present, that data will be
stored in the hash $self->{etiserverdata}. See
parse_eti_server_data($data) for details.
A warning will be carped if the length of the parsed properties (including the C null string terminators) is not equal to the length of the data passed.
parse_imp_header()Parses the first 48 bytes of a .IMP file, setting object variables. The method croaks if it receives any more or less than 48 bytes.
Version. Expected values are 1 or 2; the version affects the format of the table of contents header. If this isn't 1 or 2, the method carps a warning and returns undef.
Identifier. This is always 'BOOKDOUG', and the method carps a warning and returns undef if it isn't.
Unknown data, stored in $self->{unknown0x0a}. Use with caution
-- this value may be renamed if more information is obtained.
Number of included files, stored in $self->{filecount}.
Length in bytes of the .RES directory name, stored in
$self->{resdirlength}.
Offset from the point after this value to the .RES directory name,
which also marks the end of the book properties, stored in
$self->{resdiroffset}. Note that this is NOT the length of the
book properties. To get the length of the book properties, subtract
24 from this value (the number of bytes remaining in the header after
this point). It is also NOT the offset from the beginning of the file
to the .RES directory name -- to find that, add 24 to this value (the
number of bytes already parsed).
Unknown value, stored in $self->{unknown0x18}. Use with
caution -- this value may be renamed if more information is obtained.
Unknown value, stored in $self->{unknown0x1c}. Use with
caution -- this value may be renamed if more information is obtained.
Compression type, stored in $self->{compression}. Expected
values are 0 (no compression) and 1 (LZSS compression).
Encryption type, stored in $self->{encryption}. Expected
values are 0 (no encryption) and 2 (DES encryption).
Unknown value, stored in $self->{unknown0x28}. Use with
caution -- this value may be renamed if more information is obtained.
Unknown value, stored in $self->{unknown0x2A}. Use with
caution -- this value may be renamed if more information is obtained.
The upper nybble at this position is the IMP reader device for which the
e-book was designed, stored in $self->{device}. Expected values
are 0 (Softbook 200/250e), 1 (REB 1200/GEB 2150), and 2 (EBW
1150/GEB1150).
The lower nybble marks the possible zoom states, stored in
$self->{zoomstates}. Expected values are 0 (both zooms), 1
(small zoom), and 2 (large zoom)
Unknown value, stored in $self->{unknown0x2c}. Use with
caution -- this value may be renamed if more information is obtained.
parse_resource_cm()Parses the !!cm resource loaded into $self->{resources},
if present, extracting the LZSS uncompression parameters into
$self->{lzssoffsetbits} and $self->{lzsslengthbits}.
Returns 1 on success, or undef if no !!cm resource has been loaded
yet or the resource data is invalid.
parse_resource_images()Parses the image data resources loaded into $self->{resources},
if present, placing the image data and metadata of each image found
into $self->{jpg} and $self->{png}, keyed by 16-bit
image resource ID.
Returns the total number of images found and parsed.
This method is called automatically by load() and load_resdir().
See also accessor methods image(%args) and image_hashrefs(%args).
parse_resource_imrn()Parses the index of text offsets to all images as stored in
$self->{resources}->{'ImRn'}, if present, storing them in
$self->{imrn} as a hash of hashrefs indexed by its
32-bit integer offset to the 0x0F control code in the uncompressed
text stored in the DATA.FRK resource.
Returns the total number of offsets found and parsed.
The hash keys of each offset hash are:
widthImage display width in pixels.
heightImage display height in pixels.
idA 16-bit integer value used to uniquely identify the image inside a particular resource type.
restypeThe four-letter resource type string.
constF1A 32-bit value of unknown purpose which should always be 0xFFFFFFFF.
constF2A second 32-bit value of unknown purpose which should always be 0xFFFFFFFF.
const0A 32-bit integer value of unknown purpose which should always be 0x00000000.
constBA 16-bit integer value of unknown purpose which could be 0xFFFA, 0xFFFB, 0xFFFC, or 0xFFFE.
unknown16A 16-bit integer value of unknown purpose found only in 1150 resources.
unknown32A 32-bit integer value of unknown purpose.
This method is called automatically by load() and load_resdir().
parse_text()Parses the ' ' (DATA.FRK) resource loaded into
$self->{resources}, if present, extracting the text into
$self->{text}, uncompressing it if necessary. LZSS uncompression
will use the $self->{lzsslengthbits} and
$self->{lzssoffsetbits} attributes if present, and default to 3
length bits and 14 offset bits otherwise.
HTML headers and footers are then applied, and control codes replaced with appropriate tags.
Returns the length of the raw uncompressed text before any HTML modification was done, or undef if no text resource was found or the text was encrypted.
parse_imp_toc_v1($tocdata)Takes as a single argument a string containing the table of contents data, and parses it into object attributes following the version 1 format (10 bytes per entry).
Resource name. Stored in hash key name. In the case of the
'DATA.FRK' text resource, this will be four spaces (' ').
Unknown, but always zero or one. Stored in hash key unknown1.
Size of the resource data in bytes. Stored in hash key size.
parse_imp_toc_v2($tocdata)Takes as a single argument a string containing the table of contents data, and parses it into object attributes following the version 2 format (20 bytes per entry).
Resource name. Stored in name. In the case of the 'DATA.FRK' text
resource, this will be four spaces (' ').
Unknown, but always zero. Stored in unknown1.
Size of the resource data in bytes. Stored in size.
Resource type. Stored in type, and used as the key for the stored
resource hash.
Unknown, but always either zero or one. Stored in unknown2.
set_book_properties(%args)Sets the specified book properties. Returns 1 on success, or undef if no properties were specified.
identifierThe book identifier, as might be provided as an OPF <dc:identifier>
element.
categoryThe main book category, as might be provided as an OPF <dc:subject>
element.
subcategoryThe subcategory, generally a set of search arguments for the ETI website.
titleThe book title, as might be provided as an OPF <dc:title>
element.
lastnameThe primary author's last name, but see the entry for firstname
before deciding how to handle name storage.
middlenameThe primary author's middle name, but see the entry for firstname
before deciding how to handle name storage.
firstnameThe primary author's first name, but this field is also used by a
great many .imp books to store the entire name in "First Last" format.
If this field is to be used this way, lastname and middlename
must be blank.
$imp->set_book_properties(title => 'My Best Book',
category => 'Fiction',
firstname => 'John Q. Public');
All procedures are exportable, but none are exported by default.
detect_resource_type(\$data)Takes as a sole argument a reference to the data component of a resource. Returns a 4-byte string containing the resource type if detected successfully, or undef otherwise.
Detection will not work on the DATA.FRK (' ') resource. That
one must be detected separately by name/type.
parse_imp_resource_v1()Takes as a sole argument a string containing the data (including the 10-byte header) of a version 1 IMP resource.
Returns a hashref containing that data separated into the following keys:
nameThe four-letter name of the resource.
typeThe four-letter type of the resource. This is detected from the data, and is not part of the v1 header.
unknown1A 16-bit unsigned int of unknown purpose. Expected values are 0 or 1.
Use with caution. This key may be renamed later if more information is found.
sizeThe expected size in bytes of the actual resource data. A warning will be carped if this does not match the actual size of the data following the header.
dataThe actual resource data.
parse_imp_resource_v2()Takes as a sole argument a string containing the data (including the 20-byte header) of a version 2 IMP resource.
Returns a hashref containing that data separated into the following keys:
nameThe four-letter name of the resource.
unknown1A 32-bit unsigned int of unknown purpose. Expected values are 0 or 1.
Use with caution. This key may be renamed later if more information is found.
sizeThe expected size in bytes of the actual resource data. A warning will be carped if this does not match the actual size of the data following the header.
typeThe four-letter type of the resource.
unknown2A 32-bit unsigned int of unknown purpose. Expected values are 0 or 1.
Use with caution. This key may be renamed later if more information is found.
dataThe actual resource data.
Zed Pobre <zed@debian.org>
Thanks are due to Nick Rapallo <nrapallo@yahoo.ca> for invaluable assistance in understanding the .IMP format and testing this code.
Thanks are also due to Jeffrey Kraus-yao <krausyaoj@ameritech.net> for his work reverse-engineering the .IMP format to begin with, and the documentation at http://krausyaoj.tripod.com/reb1200.htm.
Copyright 2008 Zed Pobre
Licensed to the public under the terms of the GNU GPL, version 2.
| EBook-Tools documentation | view source | Contained in the EBook-Tools distribution. |