IFR Story and Mission
For more than 50 years engineers have been trying to solve the problem of machine cognizance. If we could build machines that understand what is happening around them, we could automate the planet!
At Colorcom, we have made great progress toward this Holy Grail. We have reached a milestone with the development of a raster to vector converter technology called IFR (Indirect Formularizing Resolution). This technology represents the solution to many critical problems currently presented by unintelligible graphic information.
Today's Technology Problems
Why is visual information unintelligible? Suppose you were having a telephone conversation with a colleague who was in another city and you asked him about the weather. If he was a computer, his response would be, "I am looking out over the horizon and the first dot in the upper left corner is 75 parts red, 15 parts green, and 192 parts blue." Continuing a description of the weather in this way would probably consist of an additional 1,000,000 unrelated dots. As described, this is not a very efficient answer to intelligent recognition, but it does represent the current state of affairs utilizing raster data.
What is IFR Technology?
The IFR approach is different, in that the raster data is processed in an exact fashion, using a massive digital front-end. Taking over 14 years to develop, IFR understands any combination of raster data that can be presented. It is a total solution approach.
Through IFR, raster data is translated into a more abstract data form. These data representations are then transformed into several other representations and then finally converted into vector form.
Since people perform raster to vector conversions quickly and easily in their heads, the challenge was how to learn from this process and reproduce the conversion. This translation could not be accomplished if humans had perfect recognition. IFR development is based on the fact that humans make mistakes in their perception. We have translated this understanding into a new and revolutionary technology called IFR.
From a philosophical viewpoint, if the technology works with visual perception, then it should work when applied to the other senses. Experimentation with IFR supports this philosophy. We believe that IFR is equally applicable with many digitized sensors such as sound, touch, humidity, acceleration, etc. The possibilities seem endless.
In all sensor data, an intelligent aggregation of the data needs to be performed before meaningful information can be extracted.
Looking at a complex raster picture where every dot is a different color from any of its neighbors, we would need to see similarities between the selected dots and their neighbors before we could decipher artifacts or images in the picture.
Even if a theoretical "super-human" could see all of the pixel to pixel color changes in a picture, he would need to be able to ignore the small color changes between pixels to understand or decipher the picture as well as mere mortal. Intelligence demands that the pixels be aggregated before artifacts can be extracted from a picture.
In order to fully understand the picture, the "super-human" would need to generalize similar colors and classify them by lumping them into categories. This process is the first step toward computers gaining intelligent understanding of the world around them.
Our internal system of generalization has much to do with why certain colors seem to clash with others. By looking at clashing colors, we can learn about how humans generalize and classify colors. The visual information in pictures gives us clues that allow us to lump together some adjacent pixels and make it easier to decipher the picture.
One of the first questions asked about IFR is how the technology handles complex colors. In other words, how are images handled in a 24-bit color picture where almost every pixel has a different color?
A strategy that is critical to the overall success of IFR is to keep machine cognizance relevant to human perception. Humans only see somewhere between 17 and 19 bits of color. The last 5 or 6 bits of a 24-bit color palette are wasted except for various interpolation schemes that increase resolution.
If a picture is perceived in a low noise environment such as scanning, there is not much fluctuation that exceeds 4 to 8 (i.e. 2 or 3 bits) color levels. Therefore, in the case of scanning, we can aggregate at perhaps 20 bits of color where we would be above the color noise and beyond human ability. If a worst case analysis is done from a theoretical noise perspective, about eight color levels are needed to get around the noise.
As a test, we filmed video using a poor quality VHS camcorder. This was sampled at 4 times the color burst which added a great deal of noise. Over-sampling of clocked data can cause bias due to the violation of the Nyquist sampling theorem. There is at most one color per quadrant of the wave cycle. We found that the video noise peaked at about six color levels at 24-bit color. As a sanity check, we did the same test on a weak aerial TV signal. Theoretically, this noise would be less than that of the camcorder because of the poor color-burst phase lock on old VHS systems. Our suspicions were correct. The noise on this picture was 4 to 5 color levels.
Slew rate distortion (e.g., a slow video amp) can enter into this picture. There are cases where color trends, like an intended smear, can exist without adjacent pixels being the same; however, it is easy to distinguish between invalid slews and valid smears.
IFR includes algorithms that handle all of these concerns. They are implemented as simple state machine functions that do not require a math library.
IFR eliminates the color problem with a color generalization algorithm that lumps adjacent color pixels together. In some ways this could be considered lossy; however, it should be considered intelligent. In the section entitled "Artifact Aggregation," we pointed out that a picture must have artifacts to have any visual intelligence. This does not occur unless adjacent pixels are lumped together to decipher the artifacts.
IFR intrinsically borders the smears and treats them as blobs, but the first version of IFR will not have the ability to handle adjacent smears efficiently. This is not a problem; it is just that current IFR development has not placed this as a high priority.
In the various video pictures that were tested, there were no smears to be found except for the intentional smear that was part of the video color pattern generator test, and this smear was not adjacent to another one.
The solution is part technical and part social. With current state of the art computing, it takes the computer an inordinate amount of time to render a complex picture. IFR has the ability to over-aggregate colors, which eliminates insignificant detail.
Many pictures contain insignificant details that are not needed or desired. When IFR compression is taken into the lossy mode, aggregation can actually become an effective advantage. For example, a high-resolution scan of a 35mm photograph might show creases and wrinkles in the backdrop. These slight shade differences are easily eliminated when the picture is over-aggregated.
From Accelerated I/O:
Scheduled: Winter of 2009 Price: $17.97 each in quantities of 1,000 Download - Free trial for 60 days
* Compression: Pac-n-Zoom Pre.Vu offers black and white compression that is projected to be 10 to 100 times more than TIFF group 4. The advantage is greater for larger documents.
* Image Restoration: The image is partially restored.
* Free Viewer: The program comes with a viewer that may be freely distributed.
* Interface: A flexible interface allows the program to easily slip into custom solution.
* Searches: The text inside the scanned image can be searched with a standard Internet search engine. The program comes with an OCR engine, or an external OCR engine may be used.
Scheduled: Spring of 2009 Price: $47.97 each in quantities of 1,000
* Pac-n-Zoom Color: Pac-n-Zoom Color is a major upgrade of Pac-n-Zoom Pre.Vu. All of Pac-n-Zoom Pre.Vu's features are still included.
* Color: Color will be compressed at photographic quality. Colored text will compress at approximately the same ratios as black and white text. Photographs are projected to compress twice as much as JPEG but with better quality.
* Image Restoration: Image restoration is enhanced.
Scheduled: Winter of 2010 Price: $129.95 in small quantities or $77.97 each in quantities of 1,000
* Pac-n-Zoom Color: Pac-n-Zoom Math is a major upgrade of Pac-n-Zoom Color. All of Pac-n-Zoom Color's features are still included.
* Sizing: The size of the picture can be changed without adding artifacts.
* Compression: Vector based compression will increase the amount of compression by about 5 times over Pac-n-Zoom Color.
* Image Restoration: Pac-n-Zoom Math sets the highest standard of image restoration.
* OCR: With the first update, Pac-n-Zoom Math will gain the ability to recognize text (Optical Character Recognition or Text Recognition) with human-like accuracy.
* Audio: Audio files will be added in the second update of Pac-n-Zoom Math. They will offer better fidelity and about 10 times (estimated) more compression than current audio formats.
* Speech: The second update will give Pac-n-Zoom Math speech recognition with better than human accuracy.
* Customization: The viewer will be open sourced to allow custom vector solutions. Speech and text recognition demonstrate how the viewer can solve a wide range of problems. Over several months, we expect many others to solve their problems with the viewer toolset.
Purpose In 2005, the document handling was a $19 billion industry (from IDC), but, (from ImageTag) "Current paper-to-digital solutions capture less than 1% of the paper headed for the file cabinet." Digital documents have cost, access, speed, organization, durability, efficiency, environmental, competitive, discovery, and other compelling advantages over paper documents, but conventional technology will offer either a good image or compression but not both simultaneously which leaves no feasible solution. In this paper, we will analyze and demonstrate the methods needed to get both quality images and unprecedented compression at the same time.
Document Handling White Paper:
Paper-based filing systems are several thousand years old and found in every office across the world, yet digital documents could exude many clear advantages over paper-based filing systems if they simultaneously exhibited good compression and quality images.
1. Cost: Digital documents are nearly free, and they are getting cheaper with each passing day. Hard and other disk drives keep increasing in capacity, but their prices remain the same. With state of the art compression, such as Pac-n-Zoom®, a 10 page document only requires about 10 KBytes which is 1 KByte per page. A 250 GByte hard drive currently costs about $80.00. Then, 1 page can be stored for 80.00 / 250G * 1K = 32 millionths of a cent, and it can be backed up for much less than that. By comparison, it costs about .8 cents for the paper and another 2.4 cents for the ink to print a sheet of paper. This means that it costs about (.8 + 2.4) / .000032 or 100,000 times as much to print a page as it does to write it to a disk. Copying all of these papers takes an average of 3% of all revenue generated by US businesses. If we assume 20% margins, then office copies cost us 15% of our income. Besides having numerous other competitive advantages, digital documents are much cheaper.
2. Access: The value of a document is realized when someone views it. The cost of copying a digital document is less than a billionth of a cent, and the copies can be simultaneously shipped at the speed of light to most places in all 7 continents. With the mobile and far-flung work force of today, it is increasingly unlikely that the needed document will be in the same office as the employee who needs it. There are many times when files need to viewed by people outside the company. Digital storage can provide cheap and manageable access to the IRS, SEC, or civil subpoenas. By providing access to vendors and other outsourcing agents, many processes can be optimized. Digital documents can be served to almost anyone in anyplace for next to nothing.
3. Speed: Any accountant can tell you that time is money. In a well designed system it takes less than two seconds to click on a link and retrieve a digital document, but on the average, it takes about 20 minutes to search, retrieve, deliver, and re-file a document. There is no need to put the digital document back. When these facts are taken together, digital files are about 600 times faster then paper files even when paper files are in an ideal setting which is an increasingly bad assumption. Faster access equals a quicker response to the customer's needs which translates to greater profits.
4. Organization: Even if a filing clerk is able to label the files in a way that the files can be found, there is almost no chance that the appropriate and necessary cross-references are included. Very few documents have the luxury of starting at the beginning and telling the whole story through to the finish. Most documents are a thread in a tapestry of thought, and without the appropriate cross references, most paper documents are out of context. The author of a digital document can link in cross references, and the organization is not left to the filing clerk who probably doesn't know what they should be. Without these vital links, 60% of employees waste more than 60 minutes each day by duplicating work that has already been done (from www.openarchive.com). Paper files are so much trouble that the average disorganized office manager has 3,000 documents just "lying around" (from a study by US News and Report). The costs from late fees, premium prices, and other chaos expenses can eat up to 20% of the entire budget. 80% of a company's information exists as unstructured data that is scattered throughout the enterprise (from KMWorld). The paper files are ironically the most disorganized part of many organizations. Digital files are easier to find, integrate, and organize.
5. Durable: (From World-scan.com) "More than 70% of today's business would fail within 3 weeks if they suffered a catastrophic loss of paper-based records due to fire or flood." For their part, digital documents tend to survive these unforeseen events. For example, nearly all the paper documents were lost but almost all the digital documents survived the World Trade Center tragedy. It doesn't take a catastrophe to lose paper. 22% of all documents get lost and 7.5% are never found which wipes out 15% of all profits. For their part, digital files are not normally removed from the filing cabinet which dramatically reduces the chance of losing the document.
6. Efficiency: In many cases, it easier to outsource some corporate function than it is to "reinvent the wheel" in-house. The primary problem of small companies (according to the Boulder County Business survey of companies in Boulder County) is the handling of government paperwork. The second largest problem these companies have (according to the same survey) is the handling of personnel. A company could specialize in handling of government paperwork or personnel issues and be far more efficient than everyone doing the same things paperwork themselves, but the specialized companies would need access to the necessary files which digital files allow. Digital files allow managers to easily verify the existence and accuracy of all the paper trails which is nearly impossible with paper files. In these and many other ways, digital files make a company much more efficient than paper files.
7. Environmental: In 1993 US businesses used 2 million tons of copy paper, and in 2000 this waste grew to 4.6 million tons or more than 92 billion sheets of paper (Document Magazine). Since it takes 17 trees to make a ton of paper, the US used 78 million trees worth of paper in 2000. To make matters worth, the use of paper is constantly and quickly increasing. As shown above, the use of paper more than doubled within 7 years, and paper already accounts for 40% of the municipal solid waste stream. Digital copies leave almost no environmental scars.
8. Competitive: A winning team plays together. While businesses increasingly organize and automate around the computer, paper documents resist efforts to increase productivity. When a company automates paper processes, they gain a clear advantage over their competitors. Competitive companies are trying to move faster. For example, law firms manage millions of pages of documents, and it is imperative to a court case that the right documents and case files are available to the right person at the right time. To quote Ali Shahidi of Alschuler, Grossman, Stein, and Kahan LLP, a Santa Monica law firm, "We're doing things we couldn't have imagined a few years ago. We're smarter, better, and more nimble." At the present time, 99% of the paper documents can not be analyzed, automated, and organized by a computer. The paper part of the office is a remnant from the last century, and it doesn't allow a company to move forward with the modern techniques of the information age.
9. Discovery: As the Internet has proven, digital information is the easiest information to find. In a typical office, if an important document is viewed by an employee, there is significant chance that the document will be lost forever. The average white collar worker spends 1 hour each day looking for lost documents (from Esselte). Since digital documents are not removed from the filing cabinet, they are unlikely to be lost. The searcher may not know what label the file was filed under. With the ability to contain links, digital documents are easier to cross reference and maintain context. The text of most digital documents can be recognized by the computer which allows the searcher to search for phrases inside the document. Digital documents are easier to find than paper files because digital documents are unlikely to be lost and digital text is searchable by a computer.
Paper-based filing systems are several thousand years old and found in every office across the world, yet digital documents could exude many clear advantages over paper-based filing systems if they simultaneously exhibited good compression and quality images.
As we have shown, digital files have many advantages over paper, but only about 1% of the files in the filing cabinet have been digitized. With current technology, it is not practical to digitize a large percentage of the documents. Users can get a good image (e.g., JPEG) or good compression (e.g., TIFF G4), but they can not get both a high quality image and small file size at the same time.
If the text is big and black and if the paper is clean and white (with no writing), a threshold segmenter followed by a statistical compressor, such as TIFF G4, yields file sizes that are useable (if not a little annoying) on a LAN. Threshold segmentation is not the, "silver bullet" people need, and the files image to a FAX-like quality. If we were looking at the yellow carbon copy of a receipt, much of the information would be lost. With threshold segmentation, TIFF G4 has adequate compression but poor image quality.
JPEG has moderately good image quality because it skips segmentation altogether, and simply performs a discreet cosine transform (with huffman encoding) on the image. The quality comes with a price. Since all the noise is left in the image, the compression is relative small. The user will have to wait long times for the file to serve, transmit, and load. If the file was being transmitted over the Internet, the user could easily be waiting 10s of seconds to view a single page. Without segmentation, JPEG has a good image but poor compression.
To move the files from the filing cabinet to the computer, we need a solution that has both a good image and high compression.
Threshold segmentation is usually not good enough to handle the spectrum of document needs a business has.
For example, a common receipt might have blue printing on yellow paper with a red number in the upper right-hand corner. It is not uncommon for some of the blue printing to be very fine while the issuing company's name is in big bold print. People often write and stamp on the receipt. The colors they use don't have to conform to those on the receipt. The receipt might be a carbon copy with the attendant image degradation. The handling of the receipt might be an issue. It might have gotten dirty or crumpled, and these are only a small sample of image degradation possibilities.
A simple receipt can challenge the computer's ability to store the image without loss while achieving useable compression, and there are many applications (such as X-rays, drawings, and others) that are even more difficult.
Threshold segmentation is fast and easy, but the quality of segmentation often falls short of what is needed to do the job. In fact, the standard that everyone is held to is human-like segmentation. If people can't see what is on the receipt, then the receipt carries the blame.
Few things shed blame as well as humans. A compression project with less segmentation than a human will dam the river of blame and divert it through the IT department.
Maybe the accounts payable person did write "paid" on the invoice that was paid twice, or maybe the necessary scrawl was omitted. Either way the computer will be blamed if it has a reputation for missing such things. To avoid all of these accusations, the document imaging system needs to segment like a human (or better).
This means that threshold segmentation is not good enough except in certain conditions where image quality can be guaranteed. In fact, the only acceptable segmentation for most of the paper in the office is human-like segmentation. Humans set the standard.
When the computer industry started moving paper documents onto the computer, segmentation was used to create a higher contrast (or sharper) image. A segmented image is more easily compressed, because there are fewer artifacts to compress. Segmentation removes many small defects (we call noise) out of the picture.
These are all convenient reasons to use segmentation, but unlike our conventional document imaging methodology, humans require segmentation to achieve extraction.
When we use our senses to perceive something, we are actually performing several steps. Since we use our senses so much, these steps have become second nature to us, and may even be performed in the subconscious.
The first step, which we call segmentation, is that of grouping like shades together. Let us use a black letter 'e' on a white background as an example. The black would probably be many different shades up and down the 'e' (here is a typical example of a black 'e' scanned from a white sheet of paper), but we would need to group all of them together before we could recognize the letter 'e'.
While segmentation might seem as if it adds artificiality to the picture, recognition requires segmentation. In other words, if we want the picture to mean anything to us, we will have to segment it in our head.
The process of finding a shape (or any feature) of an object is called extraction. The extracted shape is compared to memory (probably a database but isn't necessarily limited to template matching). If we can match the shape to a shape stored in memory, we recognize the shape. In our example, the segmentation must bring the "e" in as a single region, or we will extract a shape not recognized by the database. In other words, when parts of a letter are missing, it is difficult to identify the partial letter.
The document imaging industry's use of segmentation to achieve OCR extraction is similar, but threshold segmentation is often a fatal shortcut that prevents robust extraction. For example, a computer can not perform OCR extraction with the accuracy a human can in a multiple color environment.
As computers get stronger, they can afford to forsake threshold segmentation for the more robust edge detection segmentation that is more intelligent and human-like.
A computer with intelligent human-like segmentation isn't limited to more accurate extraction. As the computer begins to aggregate the image through edge detection segmentation, it can also achieve much better compression than it could with threshold segmentation.
To understand how these things occur, let's backtrack to explain the details of each type of segmentation.
Threshold segmentation is the simplest type of segmentation. In a simple example, if we had black text on white paper, we could set the threshold to gray. All of the text that is darker than the gray threshold would be considered black, and any background that was lighter than the gray threshold would be considered white. When the segmentation is complete, there are two colors in the picture.
Threshold segmentation can be much more complicated than this. For example, a histogram of colors could be taken across a region or an entire picture. The more predominant colors can be considered the foreground and background. The threshold could be set at the middle color between the foreground and background colors. Of course as threshold segmentation becomes more complicated, it runs slower.
In a standard 8.5 inch by 11 inch sheet of paper scanned in at 300 dots per inch with all three primary colors, we have 8.5 * 11 * 300 * 300 * 3 or 25,245 KBytes of data, if we assume 8 bits per primary color. Even today's computers take a noticeable amount of time to chew through 25 MBytes of data. Therefore, threshold segmentation (typically a simple variety) is usually used.
Threshold segmentation should be considered a fast but coarse segmentation. Threshold segmentation does not address a number of segmentation problems required by human-like segmentation.
For example, transition distortions are usually the largest distortion introduced by a scanner, and threshold segmentation does not find or rebuild the border. In many cases, a pattern of colors is mixed together to provide some information (for example a logo), and threshold segmentation will not be able to sort through the color complexities with human-like intelligence.
Edge Detection Segmentation
To segment like a human, the segmenter needs to mimic the human process of edge detection segmentation. For about 30 years, people have been trying to achieve high quality with edge detection segmentation, but it has only recently been accomplished.
Edge detection segmentation does a better job of supporting image restoration. The edge of a blob is distorted in a variety of ways, and the edge needs to be discovered before it can be restored.
In the past, most edge detection segmentation was too coarse to be used in document imaging, but computers can theoretically handle edge detection better than humans. Humans can only see about 200,000 colors, but computers typically work with 16 million colors. The optoelectronic sensing technology is capable of many more.
When the computer does a better job of segmenting, it can do a better job compressing. If the picture is over segmented, the baby is thrown out with the bath water. In other words, the data washes out along with the noise. With better segmentation, more noise can be corrected while leaving the data.
The input distortion from the optoelectronic capture appliance (usually a scanner) prevents high levels of compression in color documents; so a big size and quality difference can currently be found between color documents created on a computer and those scanned from a paper.
For example, a scanned color document can be compressed about 3 times with a statistical compressor, but the same file created on the computer could be compressed about 100 times. Furthermore, the computer generated document would be much cleaner and clearer.
Edge detection segmentation is a much more complicated and computer intensive segmentation technique. At first, edge detection might seem simple, but it is complicated be several factors.
1. Continuous Tone: We may not be segmenting along a clear color transition. In fact, the image could be continuous tone with nearly unlimited color variations.
2. Image Distortion: Clear color transitions are usually smeared 4 or 5 pixels in two directions. Finer details become blurry, and humans are able to segment some of this. Then the humans expect the computer to segment at human level.
3. Fine Artifacts: Text (of a specific font) comes with a finite set of artifacts, and they have a minimum size. Artifacts below the minimum can be ignored in text, but they must be segmented in continuous tone.
As we would expect, edge detection segmentation takes much longer than threshold segmentation.