Abacus Programming Corporation:
Systems Engineering and Software Specialists ABACUS PRODUCT DESCRIPTIONS The Abacus Textual Analysis and Data Extraction Toolset™ is a collection of software tools that allows users to quickly and intelligently extract relevant data from text sources such as Internet web pages, text documents, text data files, etc.

Textual Analysis and Data Extraction Toolset


Overview

The Abacus Textual Analysis and Data Extraction Toolset™ (TADET™) is a collection of software tools that allows users to quickly and intelligently extract relevant data from text sources such as Internet web pages, text documents, text data files, etc.  It also provides the user capability to measure the relevancy of extracted data and to integrate the data accordingly prior to displaying it in custom formatted reports.  In addition to a flexible user-interface, TADET™ consists of four major components:

Data Analysis Scanning and Extraction Language™ (DASEL™) DASEL™ is a powerful scripting language that provides an analyst with the capability of creating rule-based text scanners that can extract useful data from text files.  The language contains features for navigation, text matching, data extraction, conditional logic, rejection criteria, data formatting, arithmetic expressions, etc.  The associated DASEL™ Interpreter executes the user's scanning rules and extracts relevant data.
Parallel HTML Downloader™ (PHD™) Useful for Internet data extraction applications, the TADET Parallel HTML Downloader™ (PHD™) simultaneously sends out a set of user-specified URLs and retrieves the resulting web pages in HTML format.
TADET Relevancy Analyzer™ (TRA™) The TADET Relevancy Analyzer™ (TRA™) locates the most relevant returned data items and sorts the results in order of best match to a user's query or match phrase.
TADET Report Generator™ (TRG™) Final results can be formatted into a useful reports using the TADET Report Generator™ (TRG™)

TADET™ Tool Interaction
TADET™ Tool Interaction


TADET™ System Features Summary

* Utilizes a specialized high-level scripting language for text scanner specification
* Produces an audit trail, parse tree, and error reports for error tracking/debugging
* Provides a site selection utility for management of multiple Internet site URLs
* Provides a file browser utility for file selection and management
* Utilizes parallel downloading to retrieve web data from user-specified site URLs
* Conducts relevancy analyses for best results
* Formats results reports
* Utilizes a report generator to produce clean, focused user-defined reports



Overview of TADET Components


Data Analysis Scanning and Extraction Language™ (DASEL™)

The Data Analysis Scanning and Extraction Language™ (DASEL™) is a ruled-based scripting language for creating text scanners that can extract relevant data from text files such as HTML retrieved from an Internet site.  Scanner rules are written in a text file and submitted to DASEL™ along with the downloaded HTML to be scanned.  In addition to the scanner results reports, DASEL™ output includes three reports for debugging the rule set: the Parse Tree Report, the Audit Trail Report, and the Error Log Report.  These reports are valuable tools for creating effective rule sets.  The DASEL™ system also contains configuration management facilities for URLs, source text files, scanner rule files, output results files, etc.


Parallel HTML Downloader™ (PHD™)

The Parallel HTML Downloader™ (PHD™) is a valuable software tool that is capable of sending out multiple URLs into the Internet in parallel and simultaneously capturing the corresponding web page HTML source for later analysis.  The Downloader is integrated into the TADET™ system and operates in the background without the necessity for user maintenance.


TADET Relevancy Analyzer™ (TRA™)

The TADET Relevancy Analyzer™ performs post-processing on data items extracted from source text files. Some of the main features are:

* Removal of duplicate results
* Application of nine relevancy metrics to locate the best results
* User-adjustable metric importance weights
* Normalized relevancy scores for sorting
* User-defined rejection criteria

The user supplies a query or series of matching words for the Analyzer to insert into the metric algorithms.  Relevancy metrics are based on text analysis features such as number of word matches, word order, word position, etc.


TADET Report Generator™ (TRG™)

The TADET Report Generator™ (TRG™) has features for the report formatting of extracted data items and text by the scanners such as:

* Data item row and column positioning
* Font type and size control
* Paging control
* Bold face warning based on defined conditionals



Home | Corporate Profile | Abacus Corporate Presentation | Abacus AI Projects Presentation | Software Development | Systems Engineering & Analysis | Artificial Intelligence | Avionics Systems | Ground Systems | Computer Systems | Business Systems | Proprietary Products | Customer Support Services | New Activities | Key Management | Clients | Employment Opportunities | Site Map | Contact Us | About Us

         Abacus Programming Corporation:
Systems Engineering and Software Specialists© Copyright 2008, Abacus Programming Corporation