Abacus Programming Corporation:
Systems Engineering and Software Specialists ABACUS PRODUCT DESCRIPTIONS The Abacus Data Analysis Scanning and Extraction Language™ is a powerful scripting language that provides the capability to create text extraction scanners.

Data Analysis Scanning and Extraction Language


The Abacus Data Analysis Scanning and Extraction Language™ (DASEL™) is a powerful scripting language that provides an analyst with the capability of creating text scanners that can extract useful data from text files such as documents, downloaded HTML, etc. Tools are provided for the following features:

  

Features

* Forward and backward navigation through the scanned text
* Scanning sub-blocks to improve performance
* Start and finish recognition strings
* ASCII character code stripping
* Text matching by strings or by numeric position
* String occurrence, case, and inclusion options
* Multiple scanner specifications
* Data extraction by attribute and by text boundaries
* Character sequence reduction
* General replace, trim, and cut options
* If/Then/Else conditional execution with And/Or options
* Conditional execution based on command success or failure
* String, position, and item number comparisons
* Audit trail log, parse tree log, and error log available for viewing
* Scanner specification and scanned text available for edit
* User-defined rejection criteria
* Include files
* Data type formatting
* Search site relevancy analysis
* Substring extraction
* Arithmetic functions
* User defined attributes
* Variable Report Generator
    

DASEL™ Example Application: Internet Scanner Development

DASEL™ Example Application: Internet Scanner Development



DASEL™ Example Application: Internet Scanner Development

The DASEL™ tool can be used to develop text scanners for any target text file.  As an example, the procedure for developing rule-based text scanners for web pages is summarized in the figure above.  The first step is to identify the exact set of Internet Universal Resource Locators (URLs) to be analyzed.  This is usually accomplished by visiting the sites with a standard browser and copying the URLs from the address window.  Sample HTML is then retrieved from each site and saved.  Then, to create text extraction scanners, rules are written in the DASEL™ language.  The DASEL™ Interpreter tests the scanners by running them against the saved HTML.  Using the results output reports including the Error Log Report, the Audit Trail Report, and the Parse Tree Report, changes are made to the rules until they correctly scan the target text file.


DASEL™ Example Scanner

A sample DASEL™ scanner specification that extracts the title, text, and URL from a web page is shown below.

DASEL
DECLARE
SiteType Search
Attribute Title Cycle 5
Attribute Text Cycle 5
Attribute URL Cycle 7

SCANNER 1
SETUP
Start [red><B>WEB]
Jump [director.asp]
Finish [OPINION]
Return 20

CYCLE
Extract Title From [>] To [<]
If Title == [ ]
Then Extract Text From [<br>] To [<]
Extract URL From [</i>] To [<]
Include [c:\dasel\scanners\stdtrims.txt]



User interaction sequence for the DASEL™ demonstration

DASEL™ User Interaction Sequence DASEL™ User Interaction Sequence



Home | Corporate Profile | Abacus Corporate Presentation | Abacus AI Projects Presentation | Software Development | Systems Engineering & Analysis | Artificial Intelligence | Avionics Systems | Ground Systems | Computer Systems | Business Systems | Proprietary Products | Customer Support Services | New Activities | Key Management | Clients | Employment Opportunities | Site Map | Contact Us | About Us

         Abacus Programming Corporation:
Systems Engineering and Software Specialists© Copyright 2008, Abacus Programming Corporation