Lexis

                     AcronymaXtm

Power is knowledge.   Knowledge is information.   Information is language.

 
  Applications for AcronymaXtm
  Quick Examples
  Similar Systems
  How AcronymaXtm Works
  Performance, Dependencies, System Requirements
  Custom Features, Ports
  Evaluation & Licensing
  Source Licensing
  Support
  Integration
  Documentation

Applications for AcronymaXtm

AcronymaXtm brings your natural language processing applications one step toward real understanding of semantics of text. For any given plain text or HTML/XML stream of data, it returns acronym/definition pairs along with context information.

The two most common uses of AcronymaXtm are building databases of acronym/definition pairs and tagging acronym/definition pairs in linguistic corpora and user documents.

Quick Examples

Here are some of the acronym/definition correspondences AcronymaXtm can find:
Application Programming Interface (API),
(API) Application Programming Interface — the location order of acronym and its definition does not matter.

NYPD New York City Police Department — limited skips are allowed in definitions.

3DES Triple Data Encryption Standard,
3GPP 3rd Generation Partnership Project,
4H Head, Heart, Hands, Health,
G8 Group of Eight — digits are accounted for.

A&A Astronomy and Astrophysics — ampersands are okay.

MOR middle-of-the-road — compound words are okay and the case of words does not affect matching.

MOR stands for "middle-of-the-road" — explanatory constructs are ignored to an extent.

MSB Most Significant Bit;Most Significant Byte — multiple matches are fine.

D.A.R.P.A. Defense Advanced Research Projects Agency,
A/S/L Age, Sex, Language — embedded punctuation is okay.

How AcronymaXtm Works

AcronymaXtm:
—parses a source text into standalone tokens, where a token is one standalone word or a part of a compound word.
—assigns every token a type that identifies the possible role of the token in the source text.
—performs search for possible definitions in the vicinity (context window) of every acronym candidate token (such as 'PPP', 'M.A.S.H.', or '4WD'.)
—assigns confidence scores to the resulting matches and performs filtering to exclude matches with low confidence.
—returns matches and context information to your application through multiple callbacks.

Similar Systems

The Acrophile project (http://ciir.cs.umass.edu/irdemo/acronym) implements a similar approach to extracting acronyms from text and building a database of acronym/definition pairs. There is an online extraction demo. If we feed the above list of example acronym/definition pairs into Acrophile, here is what we get: Test Acrophile.

Performance, Dependencies, System Requirements

The performance of AcronymaXtm strongly depends on the nature of the source text and the acronym candidates to perform analysis for. In some texts, the nature of acronym candidates may cause severe performance lags. Such is the case with lengthy uppercase letter sequences that in most situations prove to be simply garbage. Although AcronymaXtm implements countermeasures to minimize analysis of garbage sequences, performance penalty still applies. In general, you can expect AcronymaXtm to process up to 1 gigabyte of text per hour on consumer grade hardware such as P4-2.8Ghz.

AcronymaXtm uses International Components for Unicode (ICU) by IBM to implement support for Unicode. You will need ICU DLLs to run AcronymaXtm. You get these DLLs with AcronymaXtm.

System requirements for AcronymaXtm are as follows. RAM consumption will depend on the sizes of source texts (or read buffers you use if reading from a file.) Minimum disk space required to host both debug and release versions of AcronymaXtm: 15 megabytes. Recommended CPU: an equivalent of P4-3.0Ghz. Recommended operating systems: Windows XP, Windows 2000.

Custom Features, Ports

Tailoring of AcronymaXtm to your particular application is possible; I will charge you a one-time fee that is negotiable in each specific case.

Porting of AcronymaXtm to a platform of your choice is almost always possible for a nominal fee. The target platform must run a modern C++ compiler to be able to build AcronymaXtm and ICU.

Evaluation & Licensing

AcronymaXtm is licensed on a per-application basis. The license quotation strongly depends on the actual application and the mode of use. Please contact me with information on your product to get a quote. Visit the online demo page to evaluate AcronymaXtm.

Source Licensing

You can license source code for AcronymaXtm. The right to use the source will be non-exclusive and the license will not allow you to resell both the original and the modified versions of AcronymaXtm. The current one-time fee for source licensing is $15,000.

If you are looking to acquire all rights to AcronymaXtm, source code included, you can do that for a negotiable one-time fee.

For quotations and other specific business inquiries, please contact me.

Support

Technical support for AcronymaXtm is free of charge.

Integration

AcronymaXtm is a DLL that exports available functions with __stdcall calling convention. This means AcronymaXtm integrates easily with your applications written in modern languages. A sample wrapper for C# and examples in C++ are included with AcronymaXtm. For more information refer to AcronymaXtm Manual.

Documentation

It is there.

© 2005-2006 by Mikhail Zislis of Software Species, all rights reserved