Skip to content

KitaitiMakoto/epub-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8086906 · Dec 25, 2024
Nov 19, 2023
Nov 28, 2023
Mar 12, 2017
Dec 31, 2023
Aug 19, 2012
Dec 28, 2023
Dec 10, 2022
Oct 11, 2024
Dec 8, 2011
Dec 19, 2023
Dec 25, 2024
May 6, 2018
Oct 12, 2024
Sep 29, 2024
Dec 31, 2015
Oct 12, 2024
Dec 28, 2023
Dec 10, 2022
Nov 3, 2024
Dec 19, 2023

Repository files navigation

EPUB Parser

EPUB Parser

pipeline epub parser coverage

INSTALLATION

gem install epub-parser

USAGE

As a library

require 'epub/parser'

book = EPUB::Parser.parse('book.epub')
book.metadata.titles # => Array of EPUB::Publication::Package::Metadata::Title. Main title, subtitle, etc...
book.metadata.title # => Title string including all titles
book.metadata.creators # => Creators(authors)
book.each_page_on_spine do |page|
  page.media_type # => "application/xhtml+xml"
  page.entry_name # => "OPS/nav.xhtml" entry name in EPUB package(zip archive)
  page.read # => raw content document
  page.content_document.nokogiri # => Nokogiri::XML::Document. The same to Nokogiri.XML(page.read)
  # do something more
  #    :
end
book.cover_image # => EPUB::Publication::Package::Manifest::Item which represents cover image file

See document’s {file:docs/Home.markdown} or API Documentation for more info.

epubinfo command-line tool

epubinfo tool extracts and shows the metadata of specified EPUB book.

% epubinfo ./linear-algebra.epub
Title:              A First Course in Linear Algebra
Identifiers:        code.google.com.epub-samples.linear-algebra
Titles:             A First Course in Linear Algebra
Languages:          en
Contributors:
Coverages:
Creators:           Robert A. Beezer
Dates:
Descriptions:
Formats:
Publishers:
Relations:
Rights:             This work is shared with the public using the GNU Free Documentation License, Version 1.2., © 2004 by Robert A. Beezer.
Sources:
Subjects:
Types:
Modified:           2012-03-05T12:47:00Z
Unique identifier:  code.google.com.epub-samples.linear-algebra
Epub version:       3.0
Navigations:        toc, landmarks

See {file:docs/Epubinfo.markdown} for more info.

epub-open command-line tool

epub-open tool provides interactive shell(IRB) which helps you research about EPUB book.

epub-open path/to/book.epub

IRB starts. self becomes the EPUB book and can access to methods of EPUB.

title
=> "Title of the book"
metadata.creators
=> [Author 1, Author2, ...]
resources.first.properties
=> #<Set: {"nav"}> # You know that first resource of this book is nav document
nav = resources.first
=> ...
nav.href
=> #<Addressable::URI:0x15ce350 URI:nav.xhtml>
nav.media_type
=> "application/xhtml+xml"
puts nav.read
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
    :
    :
    :
</html>
=> nil
exit # Enter "exit" when exit the session

See {file:docs/EpubOpen.markdown} for more info.

epub-cover command-line tool

epub-cover tool extract cover image from EPUB book.

% epub-cover childrens-literature.epub
Cover image output to cover.png

See {file:docs/EpubCover.adoc} for details.

DOCUMENTATION

Documentation is available in homepage.

If you installed EPUB Parser by gem command, you can also generate documentaiton yourself(rubygems-yardoc gem is needed):

$ gem install epub-parser
$ gem yardoc epub-parser
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented
YARD documentation is generated to:
/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc

It will show you path to generated documentation(/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc here) at the end.

Or, generating by yardoc command is possible, too:

$ git clone https://gitlab.com/KitaitiMakoto/epub-parser.git
$ cd epub-parser
$ bundle install --path=deps
$ bundle exec rake doc:yard
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented

Then documentation will be available in doc directory.

REQUIREMENTS

  • Ruby 2.3.0 or later

SIMILAR EFFORTS

  • gepub - a generic EPUB library for Ruby

  • epubinfo - Extracts metadata information from EPUB files. Supports EPUB2 and EPUB3 formats.

  • ReVIEW - ReVIEW is a easy-to-use digital publishing system for books and ebooks.

  • epzip - epzip is EPUB packing tool. It’s just only doing 'zip.' :)

  • eeepub - EeePub is a Ruby ePub generator

  • epub-maker - This library supports making and editing EPUB books based on this EPUB Parser library

  • epub-cfi - EPUB CFI library extracted this EPUB Parser library.

If you find other gems, please tell me or request a pull request.

RECENT CHANGES

0.5.0

  • Follow Ruby 3.5 inspection change

0.4.9

  • Restructure test

  • Restructure Rake tasks

  • Update required Ruby version to 2.6

0.4.8

  • Add Rubyzip adapter

0.4.7

  • [BUG FIX]Fix a bug that epubinfo doesn’t handle navigation properly

0.4.6

  • [BUG FIX]Prevent epubinfo tool raise exception when no nav elements

  • Tiny modifcation on Zip archive manipulation

  • Remove version specification from Nokogiri to migrate to Ruby 3.1

See {file:CHANGELOG.adoc} for older changelogs and details.

TODOS

  • Consider to implement IRI feature instead of to use Addressable

  • EPUB 3.2

  • Help features for epub-open tool

  • Vocabulary Association Mechanisms

  • Implementing navigation document and so on

  • Media Overlays

  • Content Document

  • Digital Signature

  • Handle with encodings other than UTF-8

DONE

  • Simple inspect for epub-open tool

  • Using zip library instead of unzip command, which has security issue

  • Modify methods around fallback to see bindings element in the package

  • Content Document(only for Navigation Documents)

  • Fixed Layout

  • Vocabulary Association Mechanisms(only for itemref)

  • Archive library abstraction

  • Extracting and organizing common behavior from some classes to modules

  • Multiple rootfiles

  • Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)

LICENSE

This library is distribuetd under the term of the MIT License. See {file:MIT-LICENSE} file for more info.