Alfresco
OCR (Optical Character Recognition) is the recognition of printed or written text characters by a computer. It recognizes the characters from the images or scanned documents, and that makes the images (which contain text) searchable. OCR is a very useful feature for any ECM product or software. In this blog, we will see how we can configure it in Alfresco Community Edition. We have tested this with Alfresco versions 5.1.f and 5.2.e. It should also work with other nearby versions.
Prerequisites:
Steps to Configure Tesseract:
1. Download Tesseract and install
Linux:
apt-get install tesseract-ocr
2. Stop the alfresco tomcat server
./alfresco.sh stop tomcat
3. Download the Linux /Windows context file and place at
/tomcat/shared/classes/alfresco/extension/
4. Place ocr.bat(Windows) and ocr.sh(Linux) at /
a) ocr.bat (for Windows)
REM to see what happens mkdir c:\tmp echo from %1 to %2 >> C:\\tmp\ocrtransform.log copy /Y %1 "C:\TMP\%~n1%~x1" echo target %~d2%~p2%~n2 REM call tesseract and redirect output to $TARGET "C:\Program Files (x86)\Tesseract-OCR\tesseract.exe" "C:\tmp\%~n1%~x1" "%~d2%~p2%~n2" -l eng
b) ocr.sh (for Linux)
# save arguments to variables SOURCE=$1 TARGET=$2 TMPDIR=/tmp/Tesseract FILENAME=`basename $SOURCE` OCRFILE=$FILENAME.tif # Create temp directory if it doesn't exist sudo mkdir -p $TMPDIR # to see what happens #echo "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log sudo cp -f $SOURCE $TMPDIR/$OCRFILE # call tesseract and redirect output to $TARGET sudo /usr/local/bin/tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng #sudo tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng sudo rm -f $TMPDIR/$OCRFILE
Note: Make sure that the path for tesseract command is correct in the ocr.sh / ocr.bat file
Linux:
/usr/local/bin or /usr/bin
Windows:
C:\Program Files(x86)\Tesseract-ocr\tesseract.exe
or C:\Program Files\Tesseract-ocr\tesseract.exe
5. If the current user does not have read or execute permissions on ocr.sh then give it.
chmod +rx /opt//ocr.sh
6. Add following properties in the alfresco-global.properties file located at
/tomcat/shared/classes/
Linux:
ocr.script=/opt//ocr.sh
ghostscript.exe=gs
Windows:
ocr.script=C:\\ocr.bat
ghostscript.exe=gs
7. Start tomcat server
Linux:
./alfresco.sh start tomcat
Windows:
C:\\tomcat\bin\startup.bat press enter.
Or use manager-windows.exe
Note: Existing files in alfresco will not be OCRed, you have to upload new image files to test.
Important:
If the text file is created with content in it, your tesseract is working.
Comment here, if your contents are still not searchable. We are happy to know your ECM challenges, as we love solving them Contact us!
Alfresco
Discover how Alfresco ECM revolutionizes insurance operations with seamless document management, enhanced compliance, and improved efficiency—empowering your firm to deliver…
15 Nov 2024
Alfresco
Transform your law firm with Alfresco ECM. Streamline workflows, boost efficiency, and go digital. Embrace change now or risk falling…
25 Sep 2024
Alfresco
Quick Summary: Due to technological advancements like IoT, digital transactions, social media, cloud computing, and others, companies need vast amounts…
13 Aug 2024
Alfresco
This blog outlines the top business advantages of implementing a Contract Lifecycle Management (CLM) system with
17 Jul 2024
401, One World West, Nr. Ambli T-Junction 200, S P Ring Road, Bopal, Ahmedabad, Gujarat 380058
Kemp House 160 City Road, London, United Kingdom EC1V 2NX
Nürnberger Str. 46 90579 Langenzenn Deutschland
Level 36 Riparian Plaza, 71 Eagle Street, Brisbane, QLD 4000
4411 Suwanee Dam road, Bld. 300 Ste. 350 Suwanee GA, 30024
Cube Work Space, 24 Hans Strijdom Avenue, Cape Town
B 503 Sama Tower, Sheikh Zayed Road, United Arab Emirates
34 Applegrove Ct. Brampton ON L6R 2Y8
We use cookies to improve your browsing experience.
OKAYThis website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.