org.apache.uima.java org.spin.scrubber.uima.reader.FileSystemCollectionReaderXML File System Collection Reader Reads files from the filesystem. This CollectionReader may be used with or without a CAS Initializer. If a CAS Initializer is supplied, it will be passed an InputStream to the file and must populate the CAS from that InputStream. If no CAS Initializer is supplied, this CollectionReader will read the file itself and set treat the entire contents of the file as the document to be inserted into the CAS. 1.0 The Apache Software Foundation KnownPHINodeList List of XPaths to specific fields known to contain ONLY PHI. String true false ScrubNodeList List of XPaths to scrub String true true InputDirectory Directory containing input files String false true Encoding Character encoding for the documents. If not specified, the default system encoding will be used. Note that this parameter only applies if there is no CAS Initializer provided; otherwise, it is the CAS Initializer's responsibility to deal with character encoding issues. String false false Language ISO language code for the documents String false false BrowseSubdirectories True means include files of subdirectories, recursively, of the input directory. Boolean false false ScrubNodeList /Envelope/Body/PathologyCase/FullReportData /Envelope/Body/PathologyCase/FullReportText /Envelope/Body/PathologyCase/GrossDescriptionText /Envelope/Body/PathologyCase/DiagnosisText KnownPHINodeList /Envelope/Header/Identifiers/FirstName /Envelope/Header/Identifiers/LastName /Envelope/Header/Identifiers/DateOfBirth /Envelope/Header/Identifiers/SSN /Envelope/Header/Identifiers/AccessionNumber /Envelope/Header/Identifiers/LocalMRN InputDirectory data/input/cases/train BrowseSubdirectories false org.spin.scrubber.uima.type.KnownPHI sorted begin standard org.apache.uima.examples.SourceDocumentInformation org.spin.scrubber.uima.type.KnownPHI true false true