Monday, 12 October 2015

IMPORT CSV OR ANY DELIMITED TEXT FILE WITH UNICODE ENCODINGS IN EXCEL USING VBA


This post demonstrates how to import CSV or any delimited text file with Unicode encoded in Excel using VBA code.
By default, text files are encoded in ANSI representation on WINDOWS platform.
When you try to import the ANSI encoded text files into excel using VBA, you may not see problems with the data appearing in the excel worksheet. But there is a problem when importing a text file other than ANSI encoding.

  The Possible Encodings of a Text File – 

 The following are the available character encoding representations of a text file on WINDOWS platform.
  • ANSI (default encoding representation)
  • Unicode
  • Unicode big endian
  • UTF-8
How to know the encoding of a text file?

text encodings

1. Open a text file with notepad editor.

 2. Select File menu and click on Save As

 3. Look at the Encoding at the bottom, which tells you the current encoding of a text file.

 4. Click on Encoding drop-down list, to see the supported encoding representations of a text file on your machine.



What are these encodings?

To understand more about these encodings, please check this link – Byte Order Mark (BOM)

What happens when Unicode encoded text is imported into Excel –

The below sample text file is used for testing. The copy of this file is saved with all possible encodings.

  DelimitedtextFile 4 Rows

 3 Columns

Delimited by ‘,‘ (Comma) 





The following is the output screen when each encoded text file is imported into Excel. When the import button is clicked, the data will be imported from the active cell in a worksheet.
  text encodings
If you look at the output screen, the every first cell of each encoding contains a garbage value.

Reason –

Byte Order Mark (or BOM) is a signal that tells the computer how the bytes are ordered in a Unicode document. Because Unicode can be used in the formats of 8, 16 and 32 bits – it is important for the computer to understand which encoding has been used in the Unicode document. BOM tells exactly the same to the computer. That is why we see the garbage values.

Representation of BOM by encoding –

This table illustrates how BOMs are represented as byte sequences and how they might appear in a text editor that is interpreting each byte as a legacy encoding (CP1252).

EncodingRepresentation (hexadecimal)Representation (decimal)Bytes as CP1252characters
UTF-8[t 1]EF BB BF239 187 191
UTF-16(BE)FE FF254 255þÿ
UTF-16(LE)FF FE255 254ÿþ
As per these representations, we can see the fixed characters for each Unicode encoding when the text file is imported.

How to Resolve –

This can be resolved by replacing the Unicode representation characters with an empty string.

Import Button Code –
1
2
3
Sub DoTheImport()
    ImportTextFile FName:="D:\data.txt", Sep:=","
End Sub
ImportFile is the sub-routine which imports the data. Change the sep:= as per your requirement. In this example comma is the separator.
ImportFile Code –
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
Public Sub ImportTextFile(FName As String, Sep As String)

Dim RowNdx As Long
Dim ColNdx As Integer
Dim TempVal As Variant
Dim WholeLine As String
Dim Pos As Integer
Dim NextPos As Integer
Dim SaveColNdx As Integer

'disable screen updates
Application.ScreenUpdating = False
'error handling
On Error GoTo EndMacro

'Importing data starts from the selected cell in a worksheet
SaveColNdx = ActiveCell.Column
RowNdx = ActiveCell.Row

'open the file in read mode
Open FName For Input Access Read As #1

'Read the file until End of Line
While Not EOF(1)
    'read line by line
    Line Input #1, WholeLine
    WholeLine = Replace(WholeLine, "", "", vbTextCompare)    'UTF-8
    WholeLine = Replace(WholeLine, "ÿþ", "", vbTextCompare)     'UTF-16 Unicode little endian
    WholeLine = Replace(WholeLine, "þÿ", "", vbTextCompare)     'UTF-16 Unicode big endian
    'checking if the line is empty
    If Right(WholeLine, 1) <> Sep Then
        WholeLine = WholeLine & Sep
    End If
    ColNdx = SaveColNdx
    Pos = 1
    NextPos = InStr(Pos, WholeLine, Sep)
    'finding each column data
    While NextPos >= 1
        TempVal = Mid(WholeLine, Pos, NextPos - Pos)
        Cells(RowNdx, ColNdx).Value = TempVal
        Pos = NextPos + 1
        ColNdx = ColNdx + 1
        NextPos = InStr(Pos, WholeLine, Sep)
    Wend
    RowNdx = RowNdx + 1
Wend

Close #1
Exit Sub

EndMacro:
Application.ScreenUpdating = True
MsgBox Err.Description, vbOKOnly, "Error"
Close #1

End Sub

Summary –

This way you can import any delimited text file with Unicode encodings. Make sure that the text file is not exceeding the maximum number of rows of excel worksheet.

Please share your thoughts on this post in the comments section.

Karthik Byggari

Author & Editor

Computer Science graduate, Techie, Founder of logicallyproven, Love to Share and Read About pprogramming related things.

0 comments:

Post a Comment

 
biz.