Already a customer? Login

Glossary

CIFS or "Common Internet File System" is a protocol which allows remote file-systems (directories , files ) to be mounted as drives on MS Windows machines. It is based on the original Microsoft SMB protocol. CIFS is normally not considered secure enough for internet uses because it is vulnerable to attacks . However if combined with a secure SSH tunnel or virtual private network (VPN) it is as secure as a local network.

Webrecs does not allow the CIFS port access to the internet. The only way in which Alfresco spaces can be mounted as drives in Windows Explorer is through a SSH tunnel or VPN which requires a small download and install. Webrecs provides instructions for this ( See "Installing OpenVPN on XP" or "Installing OpenVPN on Vista") so that this extremely useful facility, which allows Alfresco to be viewed as a drive in Explorer and hence allows such things as drag-and-drop, can be utilised securely over the internet.

Collaboration software is that class of software which allows people to collaborate with one another, often in real-time and often in geographically separated environments. The kinds of things which are important in collaboration products are the easy creation and manipulation of workgroups, mailing lists and forums, the ability to perform document management functions , rudimentary workflow functions and instant-messaging functions.

Webrecs provides direct collaboration functionality in the ability to start discussions on items, mail items for comment , invite users to participate in spaces and groups and maintain a record of discussions about items. To view some of the Collaborative features of Webrecs, check out some of our online videos at the Webrecs Wiki, particularly working with documents

Concurrent users are the number of users actually logged into the system at a given time. This is not the same as the number of users registered to use the system. Depending on usage patterns, concurrent users could be anything from 10% of registered users to 80% of registered users. Often concurrent usage is enforced as a licensing constraint in software - after the licensed number of users have logged in, subsequent users get their logins rejected. Webrecs does not enforce concurrent usage however we do provide recommendations as to what subscription sizes are appropriate for different usage levels. The approach we take is that YOU determine whether you need a larger subscription based on the performance you experience and the feedback from your users. Upgrading subscriptions is very easy - just access your Control Panel and upgrade. The effect is almost immediate.

Document Management refers to the software area which takes care of the lifecycle of electronic documents (particularly editable ones like Word and PDF) . Traditionally it was associated with viewing, versioning (check-in , check-out), printing and indexing, but now also encompasses the review, marking-up, publishing (to web or paper) and storage cycles. The distinction between document management, imaging, records management and collaboration is becoming increasingly blurred (which is why the term ECM or "Enterprise Content Management" has become popular)

ECM

ECM or "Enterprise Content Management" is a term used to describe the integration of many facets of the document lifecycle from authoring to reviewing to collaboration to displaying. It encompasses the traditionally separate fields of

Most of the big vendors (Microsoft, IBM, SAP, Oracle) support some or all of these to a greater or lesser extent through acquisition of companies which fill in the missing pieces; however the integration is often not as tight as might be expected and some "dumbing down" of best-of-breed features is inevitable. There are also Open source products which are highly configurable and integrated, most notably Alfresco.

Free-text searching (sometimes known as "full-text searching") is an essential tool in our quest to find documents and pages in an environment where there are large collections of uncategorised documents. Many people rely daily on free-text search facilities such as Google or Alta-Vista to loacate relevant documents on the Web. Internal free-text search engines are also used in many internal document storage systems to enhance search capabilities beyond categorised searches.

It is useful to understand the difference between free-text search and categorised search in such systems. Categorised search is searching on indexes which have been applied to specific documents, usually context-related. For example, in a medical system there might be index fields "patient number" and "patient name" , which are provided values when the document is entered into the system. Searching on these values provides a quick and easy way to locate documents which are relevant to a particular patient , and invariably there is a special search screen which allows the required search values to be entered into specific placeholder fields. With free-text search, however, there are no dedicated index fields set aside for the purpose - the WHOLE document is searched for keywords or phrases which match an entered value. The advantage of this is that no data-entry overhead is required at the time of document import, and no special context-related screens need to be provided. The disadvantage is that searches often produce less focussed results which means more "trawling" through a list of returned results to find the required document.

The best solution is a mix of both - categorised searches are used to narrow down the list of candidate documents, and free text search can search for phrases within that list. For example , seaching on :

  • Customer name : "Bloggs"
  • Free text : "kidney + stone"

is likely to produce a result list much shorter and more relevant than either type of search alone.

Free text search technology relies on "free-text indexing" , a complex technique which places all words (and in most cases their positions) contained in a text document into a special database which can be searched very efficiently. This indexing process is essential in order to provide the extremely fast lookups to which we have become accustomed. However it is a fairly resource-intensive process which can cause the computer to become unresponsive , so often free-text indexing is done at scheduled intervals and not continuously. It is for this reason that free-text searching on documents which have just been entered will not find these documents - the scheduled indexing process has not yet been performed.

Webrecs uses the Open source Lucene free-text database system.

Webrecs saves the text version of an image file as an annotation inside the image PDF file - see PDF

FTP

FTP or "File Transfer Protocol" is a network protocol designed to allow the transfer of files across networks , often the internet. Because of security limitations of this protocol Webrecs does not recommend it as a way of transmitting data. Far better is to use the "Secure ftp " protocol which is based on SSH. Unfortunately this is not part of default Windows installs however, this can be easily remedied by downloading and installing one of a number of Open Source secure ftp clients - see instructions in the Webrecs Wiki - Installing SSH client.

There is also a very good graphical FTP client for Windows which supports Secure Ftp as well at http://www.coreftp.com/

Webrecs does not allow access to its subscribtions via ftp unless through a VPN.

Hosting is the technique whereby a software application is "hosted" or run on a remote server and accessed through the internet by one or more "clients" who use the processing power, storage and databases resident on the server. This is also known as "SaaS" or "Software as a Service" and sometimes referred to as "computing in the cloud" , the "cloud" of course being the internet. There are a number of advantages to hosted solutions, notably :

  • The clients can be less powerful and only required to run a browser like Firefox or IE which implies cost savings
  • Data is centralised and easier to backup and maintain
  • Software is much easier to upgrade since only the server copy is upgraded , not every client
  • Fault-finding is simpler since the application is on a single machine, not multiple
  • It is much more cost effective in terms of licensing
  • It is possible to access the host and application from any PC, hence remote working is easier

The main drawbacks of hosted solutions are :

  • If the internet is down , the application is unavailable
  • Speed - low bandwidth internet links can cause slow responses

Webrecs believes that the internet bandwidth and reliability in most Western countries is at a point where neither of these two drawbacks are relevant any more. In addition, the Webrecs Datassure guarantee means that you CAN have your application  running locally if this is of critical imortance

Imaging is the collective term for the processes and systems which allow paper to be converted to , and manipulated as , digital documents. Typically these are in the format of tiff , JPEG or PDF. Usually the paper is scanned with a colour or black and white scanner, but in some cases a fax machine is used to convert and transmit the paper document. A typical imaging cycle would go through the following stages :

  • Scanning
  • Automatic indexing using barcode or optical character recognition
  • Manual indexing and data correction
  • Storage in repository
  • Workflow
  • Retrieval

Webrecs stores most scanned documents in PDF format since most customers find this the most convenient. It is possible to change this to other formats if required (will be charged as a customisation )

Indexing is the process of assigning values to fields which are associated with a particular document or content type. For example, for a contract we might want to know the names of the parties to the contract, the date the contract was drawn up, and possibly the type of contract. In which case there might be fields

  • Contract party 1
  • Contract party 2
  • Contract date
  • Contract type

During the indexing process these fields might be given values, say

  • Fred Nurk
  • Joe Blow
  • 31 Feb 2007
  • Sale of dog

Well-defined indexing strategies are important to the quick and efficient finding of documents. A balance needs to be found between having too many fields per document type and too few - adding indexes is expensive especially when they need to be manually keyed in. Today with efficient free-text searching engines it is possible to restrict the number of indexing fields to a bare minimum and still be able to locate documents during searches.

See the Webrecs "Indexing strategy" article for indexing strategies for your Webrecs subscription.

Jpeg or "Joint photographic experts group" is actually a compression technique for compressing colour images (typically in "exif" or "jfif" format) , particularly successful with photographic images where file size reduction of up to 10 times can be obtained with very little loss of quality. For colour documents (text, straight lines) the reduction is not as pronounced. Most jpeg viewers can handle multiple different image formats like jfif or exif when presented a jpeg-encoded file to view so now "jpeg" has come to be known as the de-facto colour image standard on the web and the underlying image type is unknown. To further complicate things, it is possible for jpeg images to be contained within tiff files !

MYOB is the most widely-used small-business accounting system in Australia

OCR

OCR or Optical Character Recognition is the software technique of electronically "reading" an image page and producing a text file out of it. The difference between an image and a text file is often not competely understood, since they are both files consisting of collectios of bytes. Importantly , with image files (extensions jpg, tif, gif, bmp and others) the bytes refer to the positioning of elements on the screen irrespective of what the elements are . Text files (extensions txt, doc, xls, xml, html) on the other hand contain text described in a binary format (usually ascii) , sometimes together with positioning and style information describing how that text is to be presented. Which means that free-text search engines can take text files and index them it into a database which can be used during a search to find all files containing the collection of words entered.

OCR is therefore a critical step on the way to making image files, in which are included scanned documents, accessible for searching. Some scanner products contain OCR engines which automatically convert images to text as part of the scanning process. Commonly-used OCR engines are :

  • Readsoft
  • ABBYY
  • Tessearact (open source)

It is important to understand that NO OCR engine is 100 percent accurate, all produce some errors because of "noise" (specks, malformations) inherent in all scanned images. In addtition , much of the formatting information (tables,, fonts, style) is lost. For this reason it is essential to retain the image version of a document to maintan an accurate copy of the exact document. However OCR remains an extremely powerful tool creating a searchable image files and a link to the associated image file.

Webrecs provides the option to use a scan-time OCR engine (one of the options when purchasing a ScanPack) or a back-end OCR engine which is there by default. The difference is that the back-end OCR does not provide the text output with location information ie. it cannot point to the exact location of the text within the image, but will only point to the image , whereas the scan-time OCR can locate the text within the image as well.

The OCRed text version of an image is saved as an annotation to the image in the image PDF file - see PDF

 

Open source refers to the growing movement of treating source code as a shared resource, thereby allowing a global user community to contribute bugfixes and features to the software product. Open source products have become an accepted solution for even mission-critical applications since their robustness and feature-richness has been proven over the last 10 years.

Open source is characterised by infrequent stable releases followed by many small update releases, with the effect that bugs are typically very quickly found and fixed, however there is no need to upgrade unless there is a compelling bugfix or new feature. Open source also provides security in that it is always possible to download and build the source if required, allowing critical changes  to be made in-house if necessary, as well as allowing support for old versions of operating systems to be performed in-house. There are an increasing number of organisations who provide support for open-source products. Well known open-source products include

  • Apache
  • Linux
  • Alfresco
  • Openoffice
  • Firefox
  • Tomcat

Webrecs uses best-of-breed open source widely in our product suite, including :

  • Alfresco
  • Tomcat
  • nginx
  • Linux (ubuntu)
  • OpenVZ
  • Tesseract
  • Postfix
  • MySQL

These are all integrated seamlessly together to provide the Webrecs product family.

PDF

PDF or "portable document format" is the de-facto document format for the web - probably 1 in 3 text-based docments are transferred and viewed as PDF across the internet. The standard is provided and maintained by Adobe who also provide the most commonly used (but not necessarily only) viewer. The advantages of the PDF format are :

  • Open - non-proprietory, published format
  • Can handle images and text
  • Has backwards compatibility built in (newer versions of the viewer will view older standard PDF files)
  • Rich viewer functionalities including annotations, redaction, simple workflow and mark-up

It is worth noting that a scanned image can be saved as a PDF document with various software packages, HOWEVER it does not mean that it is text-searchable. Typically the scanned image is encapsulated in a PDF header, but it remains an image (ie. no text content) . For the image to be converted to text the image does need to be OCRed and saved as a text-based document.

Webrecs uses the technique of saving the text version of an OCR'ed image document as an annotation in the same PDF file as the image itself. If you look at the top left corner of the PDF viewer (NB: Adobe Reader 8.0 and above)

) annot

and roll over the small square you will see the annotation, alternatively click on the comments field at the bottom left of the PDF viewer

PDF comment.

Quicken is a popular US accounting system owned by Intuit. The Australian version of Quicken is customised for Australian conditions and marketed as Reckon

Records management is (surprise surprise) the management of records. In the traditional sense records are physical artefacts,  and techniques for the management thereof (involving the lifecycle from creation through to obsolescence ) have been around for a very long time.  There is a long history of techniques, convention and language. It is only relatively recently that this has been mapped into the electronic domain, and much of the same terms are used here as well (words like "disposition schedules", "retention periods" and "file plans" ). Essentially records management takes over from where document management leaves off - once a document has been authored, approved and published (ie. in the public domain) it becomes a record which needs to be managed through its lifecycle to its eventual destruction.

Most business documents are records (one definition of a record is some bit of information which you might at some stage wish to rely on in a court of law ) and for this reason there is legislation to cover the minimum holding time and storage requirements of these documents.

Webrecs puts you in an ideal position to manage your documents as records, complying to legal requirements for record-keeping by virtue of security, non-repudiation and permanence.  We are working on more formal measures to help you keep compliance with such items as minimum retention periods and eventual destruction.  The Alfresco framework is ideally suited to this task.

SaaS - "Software as a Service" - see hosting

SSH

SSH or "Secure Shell" is a network protocol which allows data to flow between 2 computers in a secure manner impenetrable by the rest of the network. It was developed for Unix and Linux to allow secure login sessions (or "shells") to occur . It uses public-key encryption security algorithms (which is the highest form of electronic security available). Subsequently it is commonly used as a highly secure means of transferring files across the internetand creating secure "tunnels" to allow different types of protocols (eg. CIFS, FTP, Samba ) to work seamlessly across the internet.

SSL

SSL stands for "Secure Sockets Layer" . It is a protocol which encrypts the transmission between a browser and its server preventing eavesdropping, tampering or alteration of the data. The cryptographic system used by SSL is widely used to encrypt traffic to and from banks and financial institutions, and for transmitting credit card information to online shops. Correctly implemented with 128 bit keys it is considered extremely secure. It is easy to see whether you are using an SSL link in the browser address line - SSL connection addresses start with "https" as opposed to "http"

Webrecs uses 128 bit SSL encryption for all traffic to and from its hosted services. The overhead introduced by the encrypt/decrypt process is somewhat offset by the sophisticated compression technology used by Webrecs. All data to and from the server is compressed, so that the transmission time is reduced. For this reason not much performance loss is experienced.

To use the SSL for your Webrecs system you will need to "trust" the Webrecs site certificate. When you first attempt to access your site you may get a scary message saying that the site is not trusted - do not worry about this, continue to your subscription anyway, and as soon as possible install Webrecs as a trusted Certifying Authority in your browser as per the Webrecs wiki instructions for Internet Explorer 7 or Firefox.  Thereafter all access to your Webrecs subscription will be seamlessly encrypted.

Tiff or "tagged image file format" is a format for describing images within data files. Tiff files can contain both colour and black-and-white images. Tiff files are normally compressed using an algorithm which reduces the file size dramatically . Group 4 compression is most commonly used for web use, group 3 compression is often associated with fax machines. The tags referred to in "tiff" describe the details of the image including size, type and compression. Tiff files can contain one or many pages. There is a good freeware browser plugin for tiff images at

http://www.alternatiff.com/

Webrecs uses multipage tiff files as the primary scan input files - these are OCRed and converted to PDF during import processing.

Virtualisation describes the technique whereby a physical server is split up into a number of smaller servers using a software layer which emulates the server hardware. Each virtual machine is completely isolated from the other, and each uses a specified percentage of the physical server resources (RAM, CPU, disk space) . In this way applications can be run in their own isolated space. There are many advantages to virtualisation , including

  • Cost - The cost of a physical machine is divided among the number of virtual machines
  • Isolation - all virtual macnines are completely isolated from one another, there is no chance of security breaches or accidental data viewing
  • Backups  - virtual machines are easily backed up in their entirety making it easy to get back to the exact state of both the the software and data prior to a problem.
  • Scalability - as the application grows or more users are added, it is simple to add more resources to the virtual machine
  • Portability - a virtual machine snapshot can be very easily transported to new hardware

There are 2 widely used virtualisation techiques - full hardware emulation where it is even possible to  run different operating systems to the host server inside the virtual machine, and partial emulation where the operating system is at least of the same generic type (eg. different flavours of Linux) . The most popular of the former is VMWare (predominantly Windows) , the most widely used of the latter is Parallels Virtuozzo (predominantly Linux - based on Open Source).

Webrecs uses Parallels Virtuozzo as its virtualisation layer.

VPN

A VPN or " Virtual Private Network" is a type of communication between 2 or more computers over a network infrastructure (for example the internet) where the communications between these computers can be considered to be restricted to these computers alone. In other words, no other computers can gain access to the data transmitted over the VPN and vice-versa. Typically the connection is through "tunnelling" a secure or encrypted protocol between the computers on the network , often SSH or some other public-key encryption technique.

Webrecs uses OpenVPN, an open-source package for connecting Explorer-mounted drives through the CIFS interface of Alfresco. After install of a small executable on your Windows PC, you simply use  mapping to server "\\10.8.0.1" and your communications with the Webrecs server is secure. Detailed instructions are found in the Webrecs Wiki or in the shared area of your subscription.

Web Content Management is the set of software tools and processes which allow managment of a website, particularly the customer-facing pages or "content" . Anyone who has been responsible for a website of even moderate complexity quickly finds that the managment of the information content published to the website becomes very difficult without tools to help the creation, updating , publishing and deletion of content independantly from technicalities of the displaying and rendering of the content. Web content managment products allow the content to be generated by non-technical people in placed in positions on the website in intuitive and user-friendly ways , at scheduled times and with appropriate security. Of course there is also the ever-growing nature of the types of content to deal with - today content like RSS feeds, blogs and wikis are popular and need to be managed, as do portals and portlets.

There are numerous providers of Enterprise WCM software (read expensive and heavyweight ... ) , typically the multi-stack vendors like  IBM, Microsoft, Oracle, EMC, SAP etc.

There are also plenty of Open Source products , such as Joomla, Drupal and of course Alfresco. 

Webrecs is ramping up its Web Content Management capabilities of Alfresco - soon you will be able to manage your websites with Webrecs as well as your documents  (and host them on the same box ! )

Webdav"or "Web-based Distributed Authoring and Versioning" is a network protocol which sits on top of existing HTTP or HTTPS protocols to allow for reading and writing of files hosted on a remote web-server.  There are some shortcomings in the Windows implementations of Webdav particularly on Vista and particularly over SSL. For this reason Webrecs discourages its use for the moment, preferring to use the CIFS solution to provide direct Explorer integration. 

Workflow is an area which has traditionally been associated with Imaging . It involves the modelling of typical work practices (often defined in terms of the documents produced or updated at each stage of a business process) in a software form, allowing the automation of business processes through the automatic routing of work items across an organisation. The relationship to imaging is due to the use of digital documents as primary "assets" in the flow of work. For example, an incoming application form for an insurance claim might be scanned in, then passed onto an initial assessor, before going on to a supervisor, a payment authoriser and an investigator, each of whom use the application form and add their own documents to it. Ultimately there is a digital "case folder" which is associated with the claim and which becomes a permanent record against the customer.

Webrecs provides some simple workflow processes (document review and task allocation) as part of its product subscriptions, and the underlying platform used by Webrecs, Alfresco, has sophisticated Workflow functionality built in and as such is ideally suited to customers developing their own workflows inside a Webrecs DIY subscription. Check out how Workflow is used in Webrecs in this video

Xero Partner logo

Xero is an easy to use but powerful online accounting system that’s designed specifically for small businesses.

  • Xero provides a view of financial information in real-time. There's no need to buy expensive software and install upgrades. Xero is available on your PC or Mac in the office , at home or on popular devices - anywhere, anytime
  • Xero automatically imports your bank statements daily so you can keep on top of your cashflow. Xero has a full suite of accounting features such as invoicing, payables, expense claims, GST and BAS Returns, reporting and much more
  • The beauty of Xero is you can invite a number of trusted people such as your accountants, to collaborate online, eliminating cumbersome transfer of data that can be corrupted or is out of date
  • To find out more go to www.xero.com where you can try Xero for free

Webrecs has created an Excel plugin which allows you to integrate your Document Management subscription seamlessly via an Excel spreadsheet to Xero.  Use the Webrecs-Xero Excel plugin to

  • Add Webrecs document links to your Xero entries so you can access your Webrecs documents directly from Xero - with a single button push in your Excel spreadsheet your Xero entries are created from your keyed documents !
  • Add invoices to Xero
  • Handle accounts payable and receivable
  • Use the Xero Chart of Accounts and Contacts directly in your spreadsheet
  • Open the door to outsourced data entry with your online documents .

Check out the (slightly irreverent) short demo video at Webrecs-Xero for chickens , and the usage instructions at Webrecs-Xero_Integration

We will send you the plugin for free, just contact us with your email address and we will send you the download link.

 

 

 

Quotes

In business, words are words; explanations are explanations, promises are promises, but only performance is reality

Harold S. Geneen