Technology

Since its inception, Lumex AS has been run as a privately financed research project with the aim of expanding the boundaries of OCR-technology. So far more than NOK 50 million have been invested. The project has resulted in a number of ground-breaking new algorithms which can change the market view on what capture software should be able to do.

Below is a list of conditions where Lumex delivers significant improvement over state of the art technology:

  • Typewritten material of poor quality: all kinds of old and degraded documents produced by typewriters.
  • Documents with blackletter fonts such as Gothic/Fraktur
  • Documents in languages with special letters such as ‘æ’, ‘ø’  and ‘å’ in Danish and Norwegian
  • Documents where artifacts like stamps, watermarks, handwritten text, manual underlining etc. masks the text
  • Digitization of documents stored on microfilm/microfiche.
  • Improved tolerance for other practical issues that contribute to poor results with current OCR technologies.

Below is an example of corrected characters in an old newspaper.  The red boxes show corrected letters

x1906-manilla-express


Adaptive character models

The Lumex technology builds and verifies adaptive character models based on an initial recognition – possibly from a standard OCR engine. This approach only requires self-similarity and works on all printed and typed fonts, and is very tolerant for noise, low quality print and errors in the initial recognition.

Smart dictionary and phrase look-up

By analyzing word frequency and using fast look-up in large phrase databases and dictionaries (>13 million words), Lumex smart dictionary look-up improves recognition,. Special dictionaries can be created and modified from text input. Direct internet lookup using a patented method is also possible.

Intelligent verification

Words are verified by gap/overlap filtering, optimal difference analysis and dictionary and phrase look-up (tunable according to document quality).

 Iterative approach

New character classes can be found iteratively by combining the results of dictionary/phrase look-up with template analysis.

 Optimal difference analysis

Lumex patented difference analysis subtracts aligned template images to automatically find where confusion alternatives are different (as in the example figure  for an ‘i’ and an ‘l’).This is used to find optimal difference criteria

Diffence template

 Detecting partly hidden characters

Lumex has a patented method for finding partly obscured (hidden) characters. The occlusion can be a result of stamps, manual underlining or doodling, water marks, paper creases or microfilm scratches.

Letter cluitter

The figure below shows detected characters (correct in green) partially hidden behind a “COPY” stamp

COPY Clutter

 Detection of overlapping characters

Overlapping characters either deliberate as in ligatures or by printing/typing errors can be detected by Lumex patented method of combining templates.

 Enhanced Word Recognition and Correlation Metric

One of the most common errors that occur in a standard OCR process is incorrect splitting of words into letters.  Lumex Enhanced Word Recognition eliminates the problem with splitting into characters using smart template analysis with gap/overlap filtering and will identify the most likely recognition alternatives.

 Dewarping text

Lumex patented method for dewarping text lines can be used to straightening text lines in photographed text, scans of a non-planar surface or curved lines in the original image.

Dewarping

Local adaptive binarization

Lumex has a patented method for local adaptive binarization that works even for large contrast variations as shown in figure below

Local adaptive thresholding

IPR

The core technologies used in our products are covered by an extensive portfolio of 7 patent families, with some 38 individual patents in many countries around the world. New ideas and developments will further supply the portfolio, and our know-how of OCR technology will spur further innovation in the future.

  • The patent portfolio has broad technical applicability, and can support entry into many different markets.
  • Non-core patents in our portfolio will be spun off. We are currently seeking partners for one such patent, and will use the patent portfolio actively in the partnering process.