| Analysis | Analysis is the process of evaluating a collection of electronic discovery materials to determine relevant summary information, such as key topics of the case, important people, specific vocabulary and jargon, and important individual documents. This information is useful at the outset before detailed review is conducted to help with important early decisions and to improve the productivity of all remaining electronic discovery activities.«top |
| Batch / Batching |
Batching is the term used to describe the process of gathering documents together into “batches” typically for the process of allocating documents to reviewers for categorisation (also referred to as tagging). Historically, batches of documents would normally be allocated on the basis of chronological order where attachments follow emails. The downside of this approach is that the prevalence of email chains and near-duplicates can mean that batching documents by chronological order may be a less efficient and therefore higher cost review workflow whereby reviewers find themselves reading the same or largely similar documents multiple times.«top |
| Code / Coding |
The process of entering fields of information from a document and saving these in a format that will be associated with the particular document typically within a database. The process most often refers to coding of scanned paper documents and is typically a manual process although there are automated coding technologies which can be useful if the documents to be coded are relatively homogenous, such as standard forms (this is very rare In relation to documents for commercial litigation matters). Common coded fields include document type, date, time, author, recipients, subject or title and attachment(s). See also Objective Coding. |
| Conceptual Search |
Refers to the latest searching technology which goes beyond keyword and phrase searching to also find documents that are related by reference to concepts. For example, say you are looking for references to banking transactions using keywords such as bank* (which as a ‘fuzzy’ search will return hits on banks, banking etc) and transaction*. Conceptual search may return hits on terms that a conceptually similar such as ‘deposit, ‘funds’, ‘account’, ‘transfer’ etc to the extent that the algorithms used by the software to run the search have understood such terms to be similar in concept by reference to the initial search terms and the documents being searched.Another form of advanced search referred to as ‘relevance’ which is based not on concepts but a statistical analysis of the words appearing in documents has been shown to be another effective means of extending searching beyond keywords and phrases.«top |
| Container files |
A container file is an electronic file that contains other files. A common example is ZIP files which are often used to ‘contain’ multiple files such as emails, Word documents or Excel spreadsheets. One reason for using a container file is normally that the ‘container’ file is considerably smaller in size than the sum of the files contained within. The reason for this is that when the files are added to the ‘container’ they are also ‘compressed’ (i.e. made smaller) in size. Emails are often contained within a container file. For instance Microsoft Outlook emails are contained within what is known as a PST (Personal STorage) file (for a single mail box) or an EDB (Exchange DataBase) file (the central store of multiple mailboxes). e-discovery processing extracts the contents of container files so that the individual files can be easily viewed. The extracted contents of container files will normally be somewhere between 50% and 250% larger in size than the original container file which is one of the main reasons why it is difficult to predict how many gigabytes of data will ultimately be hosted (which in turn is generally the main determinant of ongoing monthly costs).«top |
| Culling | Culling describes the process of eliminating files from a collection of electronic files. Given that the highest cost element of preparation for disclosure is normally that associated with lawyers reviewing documents, culling techniques are employed to reduce the number of documents to review. Common techniques to cull documents include deNISTing, filtering, de-duplication, near-de-duplication and email thread analysis. The approach adopted to culling documents may be one of selection of documents to include (otherwise known as an ‘inclusive’ approach) versus selection of files to exclude (known as an ‘exclusive’ approach). It is common to use both inclusive and exclusive approaches.«top |
| Data manipulation |
Refers to any steps undertaken to adapt or modify data. By data we mean information received in electronic form which may include documents, metadata, load files, etc. Data manipulation is generally undertaken by highly skilled technicians who have expertise in a range of software tools. One of Millnet’s strengths is that we have a number of such technicians who can write programs to perform bespoke / customised data manipulation to meet virtually any requirements. One such example is that of manipulating data exported from one type of database so that it is ready to be loaded into what would otherwise be an incompatible database.«top |
| Database | A database is a collection of related data organised for efficient access. There are many different database software packages and the most common within the litigation support market are from leading IT companies Microsoft and Oracle. The litigation support review software ‘connects’ to the data contained within the database but is in most instances a completely separate piece of software. One of the reasons the term ‘platform’ is used to describe a hosted review offering is that the service comprises various elements of software and IT hardware. Within a litigation support context there are generally five main types of data stored in the database.
|
| De-duplication / customised de-duplication |
De-duplication is one of the most effective and common ways in which to reduce the volume of documents for review. De-duplication is also one of the most confusing areas of e-discovery where technology firms and lawyers often misunderstand each other. The starting point is to understand how duplicates are identified so as to understand what constitutes a ‘duplicate’. The easiest and most common form of de-duplication is by MD5 or ‘hash’ value. The MD5 stands for Message Digest algorithm 5 which has become the industry standard algorithmic value for calculating a 32 digit hexadecimal number (i.e. a number consisting of 32 characters where each character is one of 16 possible characters 0-9 and a,b,c,d,e,f). The MD5 value is effectively a ‘fingerprint’ for an electronic file that reflects most of the file’s metadata fields and the content of the file. If two documents have the same MD5 value they will be identical. For emails the approach adopted is slightly different in that different metadata fields may be selected to calculate what is known as a cryptographic hash value (otherwise known as a ‘hash’ value). For example, the Nuix software creates MD5 hash values for emails based on the to, from, cc, subject and body text without reference to spaces and attachment data. Having calculated MD5 values for all electronic documents it then becomes a matter of comparing the values across the collection of data to identify (and remove or hide) the duplicates. However, consideration needs to be given as to the approach to de-duplication owing to the issues of context and document ‘families’. The issue of document ‘families’ is most important in relation to emails and their attachments. For instance, it is common for identical files such as a Word document or Excel spreadsheet to be attached to non-identical emails. This may occur for instance where an email is sent attaching a spreadsheet file which in turn is forwarded on to another person in a new email without having altered the original Excel file. Normally lawyers will want to review emails and attachments together even to the extent that such attachments may be duplicated in the document collection. For this reason the most common approach to de-duplication is to de-duplicate across emails at the ‘top level’ thereby leaving duplicated attachments in the database.
The second consideration is whether to duplicate across custodians or only within custodians. In a typical scenario, we might have the email boxes and ‘my documents’ folders from a number of people who are from the same firm and who were working on the same project that is the subject of litigation. It is normal for these people to have been on the same distribution list and to have therefore received the same emails. When their documents are collected there will be a high degree of duplication. The question of whether to de-duplicate across all custodians or only within each custodian may depend upon the importance of knowing who had what documents / emails in their possession. It is worth noting that in relation to emails the metadata (sender, recipients) will generally provide this information. Reflecting on the nature of electronic documents collected and the nature of the way in which custodians were likely to have been communicating with each other will provide some guidance as to the likely level of duplication within a collection of electronic documents. For instance, if various copies or backups of the same custodian’s documents are collected, the level of duplication may be quite high. If the same is done for a number of custodians who worked closely together on the same projects, the level of duplication across the custodians will also normally be quite high. By ‘customised de-duplication’ we refer to a process of de-duplication using narrower criteria than that normally used. For example, we have encountered instances where use of Blackberry devices or certain email archiving systems has inserted non relevant additional lines of text such as a confidentiality statement which causes otherwise identical emails not to be identified as duplicates. Millnet’s programmers are able to write bespoke programming code to accommodate such circumstances and ensure a more effective de-duplication as a result, thereby saving costs. Note that de-duplication is not the same concept as near de-duplication or email threads / chains. Refer elsewhere in this glossary for further explanation of these concepts. |
| Deconstruction | The process of dismantling a folder, bundle or otherwise fastened together collection of hard copy documents in order to scan individual pages. Deconstruction typically includes the removal of documents from binders, removal of bindings, paperclips, bulldog clips, elastic bands, plastic or card wallets, staples etc. Millnet’s approach to scanning involves the same scanning operator deconstructing, scanning, unitising and reconstructing each file or bundle of documents thereby ensuring higher levels of accuracy and quality.«top |
| deNIST | The process of removing irrelevant systems and other non user created files from a collection of electronic data. The US National Institute of Standards and Technology ‘NIST’ regularly publishes an updated list of digital fingerprint values for known systems files (the values are the same MD5 format as used to de-duplicate identical electronic files). The process of filtering out files that appear on the NIST list is often referred to as ‘deNISTing’. This is often the first step in the process of culling data where a broad approach to collection has been adopted – for instance where an entire laptop or PC hard drive has been forensically imaged and therefore contains a large volume of irrelevant systems files. «top |
| Email chain / thread / conversation |
Email chains are created by forwarding and / or replying to an original source email. Email chains are one of the most problematic issues in relation to efficiently managing email centric document reviews owing to the fact that as chains grow it is common for the text of all prior emails to remain in the body of each new email that is created when forwarding or replying. As a result, whilst there is often a high level of duplication of the content within each individual email in a chain there is still a requirement to review each email as it is possible for the author of each new email in the chain to have altered the email body text and / or to have included or excluded attachments at different points in the thread. Further complicating the review is that email threads often resemble a ‘tree’ structure whereby a single original email may give rise to hundreds or thousands of separate branches each representing a new email chain.Millnet offer a service whereby the emails at the end of each thread or chain can be identified (and confirmed that such emails contain all the text of those emails earlier in the chain). Depending upon the approach to review it may therefore be possible to review only the end email in the chain. Another approach to using this analysis is that when a review comes across an email that is say irrelevant (e.g. a conversation about football results) the review can identify other emails in the same thread at the click of a button and then tag all emails rather than waiting to come across emails in the thread again and again.«top |
| Embedded File |
Describes a file that is wholly contained within another file where the file within which it is contained is not itself a container file (refer above). For example, it is becoming more common to find emails, spreadsheets, Word documents and PowerPoint presentations embedded within other such documents. Where embedded files exist it is common that the contents of the embedded file(s) are not visible (or wholly visible) on the face of the file in which the file(s) are embedded. For instance, an Excel spreadsheet may be embedded within a PowerPoint presentation, however, when viewing the presentation file the spreadsheet may appear as a small table of figures or a chart whereas all supporting calculations and potentially other relevant information are effectively ‘hidden’ within the embedded spreadsheet. Millnet’s Smart e-Discovery processing extracts all embedded files thereby mitigating the risk of missing important or even privileged documents.«top |
| Encrypted | When using Millnet’s online litigation support services, all data transferred between your desktop PC or laptop and Millnet’s servers based in London is encrypted (i.e. enciphered / encoded). This is a security precaution so that, were someone to ‘hack’ into the transmission of data between the server and your computer, the data would be in a coded form that would be unreadable. The level of encryption depends on the web browser software being used to access the internet. If you are using the latest version of Microsoft Internet Explorer to access the internet then the encryption level may be as high as ‘256 bit’ which means there are 2256 possible keys to decode the data (since this is too large a number to represent, consider that a 128 bit encryption code has a potential 340 trillion trillion trillion possible combinations to find the key to the code).In order to encrypt the data, the first time you login to Millnet’s Secure Global Desktop ‘SGD’ service, a small program is downloaded and saved on your computer. It is this program that assists with the process of encrypting and decoding the data. Given the current level of computing power, this level of security is virtually unhackable and therefore the security risk primarily lies with the username and password you use to access the system. Millnet issue new users with highly secure passwords (comprising of a mixture of upper and lower case letters, numbers and other characters, 11 characters in length). If security is a major concern, Millnet also has an innovative biometric security solution that requires the user to provide a fingerprint in order to gain access to the system (refer to two factor authentication) below. «top |
| Exchange database file ‘EDB’ |
Microsoft Outlook is by far the most widely used email software and as such when electronic documents are collected for review they are often in Outlook format. Most companies will operate Outlook on a central server (i.e. computer) which synchronises with the users’ laptop / desktop computers. The software running on the server is ‘Outlook Exchange’. When email boxes are collected for one person or a small number of people they will generally be provided in the Outlook PST format (Personal Storage file) whereas if it is appropriate to look at the email boxes of the entire company or a large proportion of people then it is normally appropriate to collect the entire Exchange Database file, i.e. the ‘EDB’. EDB files will appear as a single file when viewed on a hard drive and cannot be opened without specialist software (or loading back into the Outlook Exchange software). EDB files will normally yield hundreds of thousands if not millions of emails and attachments and because the emails and attachments contained within an EDB are ‘compressed’ when the files are extracted from within the EDB it is normal for the resulting volume of data to be between three and five times the size of the original EDB file (which in turn has consequences for the cost of hosting this data, given that hosting costs are generally related to the volume of data hosted).«top |
| Exporting | The process of copying or moving selected documents / files from a database. Exporting occurs at various stages in the e-discovery process including when moving documents / files from the processing or early case assessment stage into the review database and / or when producing documents for disclosure or bundles. Whilst the process of exporting is largely automated it is an aspect of e-discovery that requires care and attention to detail so as to ensure that data / files are not missed, inadvertently added, corrupted or altered during the export process. «top |
| Extraction | Extraction refers to the process of copying or separating out files or other data from an original source. For example, obtaining a forensically sound copy of emails from an email server (i.e. the computer on which emails are centrally stored) involves a process of extraction. Likewise, extraction is the term used during the e-discovery processing stage to describe opening and ‘pulling out’ the contents of files that are within other files, including container files and embedded files (files inside a file that in turn is not normally considered to be a container file). «top |
| Family i.e. document ‘family’ |
The most common example of a document ‘family’ is that of emails and their attachments. As a minimum, a document ‘family’ consists of a ‘parent’ (such as an email) to which one or more ‘children’ are ‘attached’. This ‘parent-child’ relationship extends to more than just emails with attachments. For instance, it is increasingly common for the authors of documents to ‘embed’ (insert) files inside other files. For example, a Word document may have inserted within it other Word documents, Excel spreadsheets or even emails which in turn may have other documents attached. Further, it is useful to retain the concept of family relationships between not just emails or other files but also ‘container’ files (refer above) such as ZIP files and even folders. Just as it is generally more efficient to review attachments to an email at the same time as reviewing the email, it is often helpful to review the entire contents of say a ZIP file or a particular folder at the same time. It is human nature to collect documents in the same electronic ‘folder’ or ‘container’.It is also important to note that the concept of ‘children’ can extend to ‘grandchildren’, ‘great-grandchildren’ etc to the extent that an attachment to an email has in turn attachments which in turn have attachments (or embedded files). It is not uncommon for a single email to belong to a ‘family’ of documents that could be in the hundreds or even thousands of separate electronic documents. One of the key elements of e-discovery processing is that of ‘extracting’ all children, grandchildren etc whilst retaining the attachment relationship information. Whilst it is normal practice to disclose certain documents as an entire ‘family (especially emails with attachments) this has historically resulted in a need to review all of the documents in the ‘family’ prior to disclosure. The approach to reviewing and disclosing document ‘families’ is a key element of designing the ‘workflow’ for review and disclosure (refer below). Let’s say a keyword search is undertaken over an email-centric document collection that results in ‘hits’ on 5,000 documents representing a mixture of emails and attached documents. It is often the case that once all ‘family members’ are ‘pulled in’ by virtue of their association with the 5,000 documents that were responsive to keywords, there may be 30,000 or even 50,000 documents that are potentially discloseable. Depending on the nature of the matter, a strategy therefore needs to be considered for going about the review efficiently whilst minimising the risk of for instance disclosing privileged family member documents that were not responsive to the original keyword search terms but included by virtue of association with a document that was responsive. |
| Field | ‘Field’ is an IT term relating to databases. Relating IT terminology back to the legal process of review and disclosure where a ‘document’ is synonymous with a ‘record’ in the database, a ‘field’ is synonymous with a piece of information about the document such as its date (which in turn may be defined in several ways and may therefore give rise to several fields in the database). Perhaps the simplest way to describe this is to think of a ‘document’ as relating to the individual lines in a disclosure list (note however that it is common to group individual documents together into a ‘family (see above) by for instance describing an email and referencing the existence of attachments without being specific as to the details of those attachments) whereas a ‘field’ constitutes the columns in the list (typically consisting of items of data such as date, author, subject / title or description, recipient(s) etc).‘Fields’ will arise in a database or list by virtue of the process of extracting ‘metadata’ or manually entering the information, a process generally referred to as coding. Fields are used to sort and filter lists of documents and will also often contain discloseable information that is not otherwise available on the face of the documents. The ‘Data Exchange Protocol’ a document created by the UK association of litigation support professionals (see www.listgroup.org) defines the fields and the formats thereof that are recommended best practice for disclosure of electronic documents.«top |
| File Extension |
Is the suffix to the electronic file name normally in the format of a dot (“.”) followed by three or four characters. The file extension indicates the type of file, for instance a Microsoft Word file most commonly has the file extension “.doc”. The file extension is used by your computer’s operating system (most commonly Microsoft Windows) to associate each particular file with the software application that should be used to open and view the file.Note however, that a file extension is not a foolproof way of determining file type as it a very simple process to alter or even remove a file’s file extension. It is a common and low tech ploy by some users to remove or alter the file extension of files they wish to hide or to make such files difficult to access. Early e-discovery processing software often relied on the file extension to identify the type of files which in turn meant that errors in file extensions or deliberate tampering with a document’s file extension might result in such files being missed or otherwise not viewable during the legal review.Millnet utilises the latest technology such as Nuix for processing files. These software tools look beyond the file extension to determine the nature of the underlying file thereby reducing the risk of missing potentially relevant documents.
There are literally millions of different file types. To the extent that you want to ascertain the type of file from its file extension there are various extensive online libraries including www.file-extensions.org and www.filext.com |
| Filtering | May refer to both a process of ‘filtering’ undertaken as part of the e-discovery processing stages and / or a feature that is common in litigation support review software. The process of ‘filtering’ is one of including or excluding files by reference to certain criteria. For instance, one of the most common early stage filters applied is that of filtering the electronic documents to remove non user created files and systems files. This is often referred to a deNISTing. It may also be appropriate to filter out logo or other graphic files that can often constitute a large volume of the irrelevant documents.‘Filter’ as a feature in a review database is where the users of the software can select criteria for sifting (to include or exclude) files by certain criteria such as file type, date range, size, author etc. When presented with a list of 10,000 documents and no obvious criteria for searching, filtering the list to exclude certain files may be an effective way to zero in on the most relevant files.«top |
| Forensically sound |
Prior to undertaking a legal review of documents there will be some form of document ‘collection’ stage. When collecting electronic documents it is important to be aware of the impact that the actual process of collection may have on the metadata of the documents being collected. In particular, metadata relating to dates of creation and access of a document may be important to preserve. Also, metadata relating to the precise source (i.e. original location) of the documents may also be highly relevant. The other aspect of collection to bear in mind is that copying files from one location to another will normally miss all deleted and / or hidden data that may reside on digital media such as computer hard drives.The approach to collection may range from custodians of the documents copying electronic files onto a removable electronic media such as a CD, DVD, USB drive or external hard drive to the IT department of the client performing this task potentially using some techniques that preserve the metadata. A ‘forensically sound’ collection refers to one where techniques are employed to perform the data copying process so as to preserve all or as much as is practical of the metadata and to ensure that there is a defensible chain of custody of the data so that no accusation of ‘contamination’ or ‘tampering’ with data may arise during proceedings.The question of when to adopt a forensically sound approach to collection of electronic documents (i.e. normally to engage third party experts) is a judgement call to be made by reference to the application of Part 31 CPR. There will be additional costs associated with adopting a ‘forensically sound’ approach and these may often outweigh the benefits of adopting such an approach. Millnet’s rules of thumb when advising clients which route to take include:
For normal commercial disputes don’t be overly concerned by the warnings of forensic technology experts as to the risks associated with altering metadata or possibly being caught out on disclosure or at trial owing to questions as to the provenance or admissibility of documents relating to the way in which they were collected. So long as it is possible to trace the provenance of a particular document back to its original source (which is no different from dealing with paper evidence), any tampering which occurred prior to collection will be a separate issue which has nothing to do with the approach adopted to collection. |
| Forensic image file |
The process of creating an exact duplicate from a source of electronic data (most commonly a hard drive but also other storage media such as backup tapes, disks, usb keys, flash drives etc). ‘Exact’ in a forensic sense involves creating a copy at the ‘binary’ level (i.e. the 0’s and 1’s that are the building blocks of computer data) and includes files that may be deleted, hidden or otherwise stored on the source media in such a way that they are not visible without the use of specialist forensic software tools.The degree of skill required to create a forensic image file varies considerably depending mainly on the source device. Certain devices such as PDAs and mobile phones as well as old computers and servers may be particularly problematic and require specialist software and forensic skills. However, at the other end of the spectrum the creation of a forensic image of relatively modern (say less than 5 years old) desktop or laptop computer has become a very straightforward and relatively commoditised process.Refer to the notes above (re the term ‘forensically sound’) as to when a requirement to create a forensic image file may arise. The resulting forensic image does not resemble the source data as it will be contained within a ‘container’ file which in turn needs to be ‘opened’ using either the software used to create the original forensic image file or other specialist software’.
The advantage of Nuix, one of Millnet’s primary e-discovery processing software tools, is that Millnet can process directly from a forensic image file thereby skipping a stage whereby the forensic image file either needed to be ‘extracted’ or otherwise converted prior to undertaking the e-discovery processing stage. Note also that to the extent that the original source file contained hidden or deleted data that was not visible to the custodian using normal file searching software such as Windows Explorer),this will be available for analysis, search and extraction by a forensic consultant using specialist software tools. The requirement to search for hidden or deleted data goes beyond normal e-discovery processing and gives rise to additional potentially significant forensic consultancy charges. |
| Formatting | Format describes the way in which data is arranged and / or the form it is in. For instance ‘file format’ refers to the type of electronic file. The process of formatting is one of creating data arranged / presented in a particular way. Similarly ‘reformatting’ involves altering the existing format in some way.Formatting and / or reformatting are commonly required throughout the e-discovery process. Examples include:
In summary, formatting is generally any process required to standardise or otherwise present data in a form as specified by the legal review team to make such data as user friendly as possible during the review process and in readiness for disclosure. It should also be noted that there are generally accepted formats for the fields of data for disclosure and to the extent that such guidelines are not followed this can give rise to additional cost and time wasted. |
| GB / Gigabyte |
Gigabyte. The basic building block of all computer data is a ‘bit’ (a ‘0’ or ‘1’). A byte is normally 8 bits which in turn normally equates to the number of bits required to represent a character. A gigabyte = 1,000,000,000 (i.e. one thousand million bytes) of data. Equating GB to documents, files, pages or any other real world representation of document volume depends largely on the type of files and their size. A CD holds up to around 0.7 GB. Millnet’s rule of thumb is that a GB equates to approximately 15,000-20,000 black and white A4 sized scanned pages of paper. Electronic files are harder to benchmark because they vary widely in size and density of data. For instance, a gigabyte of data could well contain 20,000+ emails and attachments or even more than half a million pages of simple text files. As a generalisation, files that contain high levels of graphical content (especially colour) such as PowerPoint presentations as well as colour photographs, music and video files are far larger in size than mainly textual documents such as emails and Word documents. The number of emails in a gigabyte will normally be determined more by the extent and nature of attachments than the emails themselves.«top |
| GIF | Short for Graphics Interchange Format. GIF and JPEG files are two of the most common graphic files formats found in most electronic document collections. The prevalence of irrelevant graphic files often presents a challenge for the legal review because they are often included in the review database by virtue of belonging to a member of a document family (especially as attachments to emails such as company logo files) and even though they are clearly irrelevant it may still take time for reviewers to ‘click’ past each logo file one at a time. To illustrate how inefficient this can be, imagine a collection of 50,000 emails which upon e-discovery processing yields 30,000 gif files containing logos and other graphics attached to or embedded within other files. If it takes a reviewer 2 seconds per file to click, mark as irrelevant and move on to the next document then this will equate to 17 hours of review time which at say £175 per hour = £2,975 of additional time charges. Further, there will be ongoing monthly per GB hosting charges associated with storing these irrelevant files.Millnet’s software technology enables the highly efficient classification of GIF and other image files prior to loading data into a hosted review system for legal review. This combination of technology with a review workflow that is optimised to reduce lawyer review time wasted on reviewing obviously irrelevant files reflects Millnet’s experience and commitment to assisting law firms to go about the review and disclosure process as efficiently and cost-effectively as possible.«top |
| Hosted / Hosting |
Is a term that defines a service provided by a third party such as Millnet where access to the service is provided via the internet. A small proportion of UK law firms (fewer than 30) have acquired litigation support database software (the most common brands being Ringtail, Introspect and Concordance) and as such their lawyers may work on documents that are ‘hosted’ on a central computer (server) located within the law firm.Litigation support firms such as Millnet increasingly provide access to documents relating to a particular matter within a review software ‘platform’ that is accessed via the internet by logging in and entering a username and password. The benefits of an outsourced ‘hosted service’ versus acquiring the software directly and having an internal service include:
|
| Image File |
An image file is an electronic representation of a document created either from a scanned paper document or from the original electronic format (‘native’ file)The most common image formats are TIFF, PDF and JPEG. Most image files are not searchable unless they have been subject to Optical Character Processing ‘OCR’. The exception to this is searchable PDFs (i.e. when created, a PDF can be subjected to OCR processing to create a searchable PDF).It is important to bear in mind the potential existence and relevance of image files when searching across a collection of electronic documents owing to the fact that image files are normally non searchable. |
| JPEG | Stands for Joint Photographic Experts Group, the creators of the standard underlying this type of electronic file. JPEG files are the most common format for digital photographs and are increasingly common within collections of electronic documents.JPEG files that are colour photographs can often constitute one of the largest categories of file type based on size. The approach to be taken to address JPEG files during the review and disclosure process should be considered well in advance so as to avoid unnecessary time wasted and cost. Points to consider include:
For other comments relating to the approach to reviewing graphical files, refer to the section of GIF files above. |
| Kryptonite | James Moeskops is said to eat this for breakfast.«top |
| Lateral navigation |
Navigation refers to the way in which users of a hosted review database move around to review and search for documents (refer below for more information). In the early years of e-discovery navigation within hosted review databases was very simple and mimicked the “linear” (i.e. document by document) process of working through a folder or list of documents. The only time saving advantage in this regard is the ability to sort the documents into a meaningful order (typically chronological) at the click of a button.The latest review technologies recognise that reviewers may wish to pursue lines of investigation that do not correspond with the order in which documents may appear in a list. For instance, a reviewer comes across a particularly ‘hot’ document whilst performing a linear chronological review. This triggers a range of questions such as “what other documents were in the same folder as this document?” or “what other documents are there in the database that contain similar content?” By utilising the latest review technologies it is possible to ‘navigate’ away from the current document to answer these questions without losing track of the point at which you started. The closest analogy is to that of surfing the web. In early versions of web browser software such as Internet Explorer, the concept of hyperlinks would take you away from the page you were onto another page. Sometimes this would open a new ‘window’ and other times it changed the window you were in. Once within this new page you could jump to more new pages and so on. The means of return was via the ‘back’ button. The latest version of Internet Explorer introduced the concept of ‘tabs’ which means a new window opens in such a way that you can just click back to the tab you were on to view the original starting document at any stage. e-Discovery review software has either adopted or is in the process of adopting this approach to ‘lateral navigation’ and the productivity benefits in terms of time taken to review and find specific documents can be significant. |
| Load file |
A load file contains the elements of data required to add documents into a litigation support review database. Refer to the definition of a database for further explanation of the various elements. If particular elements of the data in the database need to be updated (for instance to add new fields that were not part of the original load file) then this is provided in a load file that is often referred to as an ‘overlay file’ in that it ‘overlays’ or appends data relating to existing documents in the database.Where the parties to a dispute are both using a litigation review database then electronic disclosure will normally involve providing the data in an agreed load file format. It is possible to manipulate data received in an incompatible format, though best practice is for there to be discussion and agreement as to the format of disclosure in the early stages of a matter so as to avoid unnecessary wasted time and cost.«top |
