python 2.7 - write.format adding extra quotes while writing to a tab delimited hdfs file in pyspark -

August 15, 2013

i have dataframe in pyspark, schema , value below-

schema-

|-- c1: string (nullable = true) |-- c2: string (nullable = true) |-- c3: string (nullable = true) |-- c4: string (nullable = true)

data-

|c1|c2|c3|c4| +--+--+--+---- |78|93|   |10| |12|97|   |20| |23|93|   |10| |78|93|   |40|

my column c3 column 3 spaces. (to more sure checked length of column , came 3)

now when try write dataframe tab delimited file in hdfs data comes out -

78  93  "   "  10 12  97  "   "  20 23  93  "   "  10 78  93  "   "  40

that quotes coming column c3. used below command write dataframe:

outp.write.format('csv').option("delimiter","\t").option("quotemode",none).save(path="path.txt")

i tried solution find none helped. want write column c3 3 spaces only, since dealing fixed length file cannot increase or decrease length of column.

please if 1 has solution, me this.

spark version using-1.6.2

one workaround, use different character substitute space

df2 = df1.replace("   ","***","c3") df2.show()  +---+---+---+---+ | c1| c2| c3| c4| +---+---+---+---+ | 78| 93|***| 10| | 12| 97|***| 20| | 23| 93|***| 30| +---+---+---+---+

you can substitute back, during select operation on dataframe.

Search This Blog

TY

python 2.7 - write.format adding extra quotes while writing to a tab delimited hdfs file in pyspark -

Comments

Post a Comment

Popular posts from this blog

android - IllegalStateException: Cannot call this method while RecyclerView is computing a layout or scrolling -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

c# - Oracle Advanced Queues - Dequeueing Commit/Rollback -