|
|
|
AcronymaXtm brings your natural language processing applications one step toward
real understanding of semantics of text. For any given plain text or HTML/XML
stream of data, it returns acronym/definition pairs along with context information.
The two most common uses of AcronymaXtm are building databases of
acronym/definition pairs and tagging acronym/definition pairs in linguistic corpora and
user documents.
|
|
|
Here are some of the acronym/definition correspondences AcronymaXtm can find:
Application Programming Interface (API),
(API) Application Programming Interface —
the location order of acronym and its definition does not matter.
NYPD New York City Police Department —
limited skips are allowed in definitions.
3DES Triple Data Encryption Standard,
3GPP 3rd Generation Partnership Project,
4H Head, Heart, Hands, Health,
G8 Group of Eight —
digits are accounted for.
A&A Astronomy and Astrophysics —
ampersands are okay.
MOR middle-of-the-road —
compound words are okay and the case of words does not affect matching.
MOR stands for "middle-of-the-road" —
explanatory constructs are ignored to an extent.
MSB Most Significant Bit;Most Significant Byte —
multiple matches are fine.
D.A.R.P.A. Defense Advanced Research Projects Agency,
A/S/L Age, Sex, Language — embedded punctuation is okay.
|
|
|
AcronymaXtm:
—parses a source text into standalone tokens, where a token is one standalone word or a
part of a compound word.
—assigns every token a type that identifies the possible role of the token in the source text.
—performs search for possible definitions in the vicinity (context window) of every acronym
candidate token (such as 'PPP', 'M.A.S.H.', or '4WD'.)
—assigns confidence scores to the resulting matches and performs filtering to exclude
matches with low confidence.
—returns matches and context information to your application through multiple callbacks.
|
|
|
The Acrophile project (http://ciir.cs.umass.edu/irdemo/acronym) implements a similar approach to extracting acronyms from text and building
a database of acronym/definition pairs. There is an online extraction demo. If we feed the above list of example acronym/definition pairs into Acrophile,
here is what we get: Test Acrophile.
|
|
|
The performance of AcronymaXtm strongly depends on the nature of the source text and the acronym
candidates to perform analysis for. In some texts, the nature of acronym candidates may cause
severe performance lags. Such is the case with lengthy uppercase letter sequences that in most
situations prove to be simply garbage. Although AcronymaXtm implements countermeasures to minimize
analysis of garbage sequences, performance penalty still applies. In general, you can expect
AcronymaXtm to process up to 1 gigabyte of text per hour on consumer grade hardware such
as P4-2.8Ghz.
AcronymaXtm uses International Components for Unicode (ICU) by IBM to implement support for
Unicode. You will need ICU DLLs to run AcronymaXtm. You get these DLLs with AcronymaXtm.
System requirements for AcronymaXtm are as follows. RAM consumption will depend on the sizes of
source texts (or read buffers you use if reading from a file.) Minimum disk space required
to host both debug and release versions of AcronymaXtm: 15 megabytes. Recommended CPU: an equivalent of
P4-3.0Ghz. Recommended operating systems: Windows XP, Windows 2000.
|
|
|
Tailoring of AcronymaXtm to your particular application is possible; I will charge you a one-time fee
that is negotiable in each specific case.
Porting of AcronymaXtm to a platform of your choice is almost always possible for a nominal fee.
The target platform must run a modern C++ compiler to be able to build AcronymaXtm and ICU.
|
|
|
AcronymaXtm is licensed on a per-application basis. The license quotation strongly depends
on the actual application and the mode of use. Please contact me with
information on your product to
get a quote. Visit the online demo page to evaluate AcronymaXtm.
|
|
|
You can license source code for AcronymaXtm. The right to use the source will be
non-exclusive and the license will not allow you to resell both the original and the modified versions
of AcronymaXtm. The current one-time fee for source licensing is $15,000.
If you are looking to acquire all rights to AcronymaXtm, source code included, you
can do that for a negotiable one-time fee.
For quotations and other specific business inquiries, please contact me.
|
|
Technical support for AcronymaXtm is free of charge. |
|
AcronymaXtm is a DLL that exports available functions with __stdcall calling convention. This means
AcronymaXtm integrates easily with your applications written in modern languages. A sample wrapper for C# and examples
in C++ are included with AcronymaXtm. For more information refer to AcronymaXtm Manual. |
|
It is there. |