python 2.7 - write.format adding extra quotes while writing to a tab delimited hdfs file in pyspark -
i have dataframe in pyspark, schema , value below-
schema-
|-- c1: string (nullable = true) |-- c2: string (nullable = true) |-- c3: string (nullable = true) |-- c4: string (nullable = true)
data-
|c1|c2|c3|c4| +--+--+--+---- |78|93| |10| |12|97| |20| |23|93| |10| |78|93| |40|
my column c3 column 3 spaces. (to more sure checked length of column , came 3)
now when try write dataframe tab delimited file in hdfs data comes out -
78 93 " " 10 12 97 " " 20 23 93 " " 10 78 93 " " 40
that quotes coming column c3. used below command write dataframe:
outp.write.format('csv').option("delimiter","\t").option("quotemode",none).save(path="path.txt")
i tried solution find none helped. want write column c3 3 spaces only, since dealing fixed length file cannot increase or decrease length of column.
please if 1 has solution, me this.
spark version using-1.6.2
one workaround, use different character substitute space
df2 = df1.replace(" ","***","c3") df2.show() +---+---+---+---+ | c1| c2| c3| c4| +---+---+---+---+ | 78| 93|***| 10| | 12| 97|***| 20| | 23| 93|***| 30| +---+---+---+---+
you can substitute back, during select operation on dataframe.
Comments
Post a Comment