Importing your own scanned text into LingQ

I love using LingQ for my language learning, and to make the most of it, I wanted to import one of my favourite French books that I have only in hardcopy. Whilst the LingQ process for importing text is as easy as could be, preparing the text involved a bit of trial and error, so I thought I would share what I do.

The book is La délicatesse by David Foenkinos, which is full of vocabulary I’d like to internalise. By the way, I love the movie with Audrey Tautou too.

Step 1: Scanning

I found it best to create individual scans of each double page, or to scan to a multi-page PDF, but it’s important not to scan with OCR (optical character recognition) turned on, so that you just have a plain PDF without any text in the background of the image. On the scanner I used the option is called a “non-editable PDF” – a slight misnomer, but anyway. If you produce a scan with OCR you may later have trouble overriding the default underlying text output – unless of course you can change the scanner’s OCR language to French, but with my office equipment that wasn’t possible.

Step 2: OCR process

I use a program called Nitro, but Adobe Acrobat would probably be similar. After importing the PDF, I go to Review > OCR > Options > Advanced, and select French as the recognition language. Then click OK, and the text recognition process only takes a few seconds.

Step 3: Create text document

Still in Nitro, I go Convert > To Word > Convert. A Word doc with the text opens up. In my experience there are some anomalies in the placement of some blocks of text, so I continue…

Step 4: Cleaning the text

I copy and paste the text in the correct order into a Notepad document. The purpose of this is to strip out any weirdness that comes from the Word doc.

Step 5: Assembling the text

I then copy and paste the text from Notepad into a WordPad document, which is easier to work with than Notepad. In WordPad I get rid of any OCR errors, for example hyphens that have magically turned into bullet points, or unnecessary spaces, etc.

It could be easier to edit in Word though. If you want to get rid of double or even triple spaces between words, you can do an easy copy/replace.

Save the file as a LingQ-compatible DOCX file if you import the whole book in once go.

Step 6: Importing the text

You can then use the Import ebook function in LingQ to easily create a lesson with the text you scanned.

Or you can create a new course and then add each chapter as a lesson.