python 2.7 - write.format adding extra quotes while writing to a tab delimited hdfs file in pyspark -


i have dataframe in pyspark, schema , value below-

schema-

|-- c1: string (nullable = true) |-- c2: string (nullable = true) |-- c3: string (nullable = true) |-- c4: string (nullable = true) 

data-

|c1|c2|c3|c4| +--+--+--+---- |78|93|   |10| |12|97|   |20| |23|93|   |10| |78|93|   |40| 

my column c3 column 3 spaces. (to more sure checked length of column , came 3)

now when try write dataframe tab delimited file in hdfs data comes out -

78  93  "   "  10 12  97  "   "  20 23  93  "   "  10 78  93  "   "  40 

that quotes coming column c3. used below command write dataframe:

outp.write.format('csv').option("delimiter","\t").option("quotemode",none).save(path="path.txt") 

i tried solution find none helped. want write column c3 3 spaces only, since dealing fixed length file cannot increase or decrease length of column.

please if 1 has solution, me this.

spark version using-1.6.2

one workaround, use different character substitute space

df2 = df1.replace("   ","***","c3") df2.show()  +---+---+---+---+ | c1| c2| c3| c4| +---+---+---+---+ | 78| 93|***| 10| | 12| 97|***| 20| | 23| 93|***| 30| +---+---+---+---+ 

you can substitute back, during select operation on dataframe.


Comments

Popular posts from this blog

html - How to set bootstrap input responsive width? -

javascript - Highchart x and y axes data from json -

javascript - Get js console.log as python variable in QWebView pyqt -