데이터를 덮어쓰지 않고 기존 Excel 파일에 쓰는 방법(팬더 사용)

programing

데이터를 덮어쓰지 않고 기존 Excel 파일에 쓰는 방법(팬더 사용)

kakaobank 2023. 4. 27. 22:37

데이터를 덮어쓰지 않고 기존 Excel 파일에 쓰는 방법(팬더 사용)

저는 다음과 같은 방식으로 뛰어난 파일을 쓰기 위해 판다를 사용합니다.

import pandas

writer = pandas.ExcelWriter('Masterfile.xlsx') 

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Masterfile.xlsx는 이미 여러 탭으로 구성되어 있습니다.그러나 아직 "Main"이 포함되어 있지 않습니다.

Pandas는 "Main" 시트에 올바르게 기록되지만, 안타깝게도 다른 모든 탭도 삭제됩니다.

Pandas 문서에서는 xlsx 파일에 openpyxl을 사용한다고 합니다.를 간다니살펴에 있는 를 간단히 .ExcelWriter다음과 같은 것이 해결될 수 있다는 단서를 제공합니다.

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

업데이트: Pandas 1.3.0부터는 다음 기능이 제대로 작동하지 않습니다. 기능이 작동하기 때문입니다.DataFrame.to_excel()그리고.pd.ExcelWriter()되었습니다 - 새 변됨경 - 로운새if_sheet_exists매개 변수가 도입되어 아래 기능이 무효화되었습니다.

여기에서 업데이트된 버전을 찾을 수 있습니다.append_df_to_excel()Pandas 1.3.0+에서 작동합니다.

도우미 기능은 다음과 같습니다.

import os
from openpyxl import load_workbook


def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('d:/temp/test.xlsx', df)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                           index=False, startrow=25)

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    # Excel file doesn't exist - saving and exiting
    if not os.path.isfile(filename):
        df.to_excel(
            filename,
            sheet_name=sheet_name, 
            startrow=startrow if startrow is not None else 0, 
            **to_excel_kwargs)
        return
    
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

    # try to open an existing workbook
    writer.book = load_workbook(filename)
    
    # get the last row in the existing Excel sheet
    # if it was not specified explicitly
    if startrow is None and sheet_name in writer.book.sheetnames:
        startrow = writer.book[sheet_name].max_row

    # truncate sheet
    if truncate_sheet and sheet_name in writer.book.sheetnames:
        # index of [sheet_name] sheet
        idx = writer.book.sheetnames.index(sheet_name)
        # remove [sheet_name]
        writer.book.remove(writer.book.worksheets[idx])
        # create an empty sheet [sheet_name] using old index
        writer.book.create_sheet(sheet_name, idx)
    
    # copy existing sheets
    writer.sheets = {ws.title:ws for ws in writer.book.worksheets}

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

다음 버전으로 테스트:

판다 1.2.3
Openpyxl 3.0.5

와 함께openpyxl 전2.4.0그리고.pandas 전0.19.2@ski가 고안한 프로세스는 조금 더 단순해집니다.

import pandas
from openpyxl import load_workbook

with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
    writer.book = load_workbook('Masterfile.xlsx')
    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

팬더 0.24부터 시작하여 이를 단순화할 수 있습니다.mode의 키워드 ExcelWriter:

import pandas as pd

with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer: 
     data_filtered.to_excel(writer)

이전 스레드라는 것은 알고 있지만 검색 시 처음 발견되는 항목이며, 이미 만든 워크북에 차트를 보관해야 하는 경우 위의 솔루션이 작동하지 않습니다.그런 경우 xlwings가 더 나은 옵션입니다. 이 옵션을 사용하면 Excel 북에 쓸 수 있고 차트/차트 데이터를 보관할 수 있습니다.

간단한 예:

import xlwings as xw
import pandas as pd

#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5

#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')

ws = wb.sheets['chartData']

ws.range('A1').options(index=False).value = df

wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')

xw.apps[0].quit()

오래된 질문입니다만, 아직도 이걸 찾는 사람들이 있을 거예요 - 그래서...

모든 워크시트가 sheetname=Discovery 옵션을 사용하여 Panda가 만든 시트 이름 및 데이터 프레임 쌍 사전에 로드되므로 이 방법이 좋다고 생각합니다.스프레드시트를 딕트 형식으로 읽고 딕트에서 다시 쓰는 사이에 워크시트를 쉽게 추가, 삭제 또는 수정할 수 있습니다.저에게 xlsx라이터는 속도와 형식 면에서 이 특정 작업에 대해 openpyxl보다 더 잘 작동합니다.

참고: 이후 버전의 판다(0.21.0+)에서는 "sheetname" 매개 변수가 "sheet_name"으로 변경됩니다.

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
                        sheetname=None)

# all worksheets are accessible as dataframes.

# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']

# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df

# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe

# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
                    engine='xlsxwriter',
                    datetime_format='yyyy-mm-dd',
                    date_format='yyyy-mm-dd') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

2013년 질문의 예:

ws_dict = pd.read_excel('Masterfile.xlsx',
                        sheetname=None)

ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]

with pd.ExcelWriter('Masterfile.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

판다에게는 더 나은 해결책이 있습니다. 0.24:

with pd.ExcelWriter(path, mode='a') as writer:
    s.to_excel(writer, sheet_name='another sheet', index=False)

이후:

그러니 지금 판다를 업그레이드하십시오.

pip install --upgrade pandas

업데이트된 버전의 python 및 관련 패키지에 대해 @MaxU 솔루션이 작동하지 않습니다."zipfile" 오류가 발생합니다.BadZipFile: 파일이 zip 파일이 아닙니다."

업데이트된 버전의 파이썬과 관련 패키지에서 잘 작동하는 새로운 버전의 기능을 생성하고 파이썬: 3.9 | openpyxl: 3.0.6 | 팬더: 1.2.3으로 테스트했습니다.

또한 도우미 기능에 다음과 같은 기능을 추가했습니다.

이제 셀 내용 너비를 기준으로 모든 열의 크기를 조정하면 모든 변수가 표시됩니다("크기 조정 열" 참조).
NaN을 NaN으로 표시하거나 빈 셀로 표시하려면 NaN을 처리할 수 있습니다("na_rep" 참조).
startcol이 추가되어 특정 열에서 쓰기 시작할지 결정할 수 있습니다. 그렇지 않으면 col = 0부터 쓰기 시작합니다.

여기서의 기능은 다음과 같습니다.

import pandas as pd

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
    truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file

      resizeColumns: default = True . It resize all columns based on cell content width
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]
      na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''


    Returns: None

    *******************

    CONTRIBUTION:
    Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
    Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
    Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)


    Features of the new helper function:
    1) Now it works with python 3.9 and latest versions of pandas and openpxl
    ---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
    2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
    3) You can handle NaN,  if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
    4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0

    *******************



    """
    from openpyxl import load_workbook
    from string import ascii_uppercase
    from openpyxl.utils import get_column_letter
    from openpyxl import Workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    try:
        f = open(filename)
        # Do something with the file
    except IOError:
        # print("File not accessible")
        wb = Workbook()
        ws = wb.active
        ws.title = sheet_name
        wb.save(filename)

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')


    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        # startrow = -1
        startrow = 0

    if startcol is None:
        startcol = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)


    if resizeColumns:

        ws = writer.book[sheet_name]

        def auto_format_cell_width(ws):
            for letter in range(1,ws.max_column):
                maximum_value = 0
                for cell in ws[get_column_letter(letter)]:
                    val_to_check = len(str(cell.value))
                    if val_to_check > maximum_value:
                        maximum_value = val_to_check
                ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2

        auto_format_cell_width(ws)

    # save the workbook
    writer.save()

사용 예:

# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue'],
                    'NaNcheck': [float('NaN'), 1, float('NaN')],
                    })

# EDIT YOUR PATH FOR THE EXPORT 
filename = r"C:\DataScience\df.xlsx"   

# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCELFILE 
  
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN

def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
    try:
        master_book = load_workbook(master_file_path)
        master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
        master_writer.book = master_book
        master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
        current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
                                                               header=None,
                                                               index_col=None)
        current_frames.to_excel(master_writer, sheet_name, index=None, header=False)

        master_writer.save()
    except Exception as e:
        raise e

이렇게 하면 완벽하게 작동하지만 마스터 파일(새 시트를 추가하는 파일)의 형식이 손실됩니다.

writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

"keep_date_col"이 도움이 되길 바랍니다.

여기에 설명된 답을 사용했습니다.

from openpyxl import load_workbook
writer = pd.ExcelWriter(p_file_name, engine='openpyxl', mode='a')
writer.book = load_workbook(p_file_name)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
df.to_excel(writer, 'Data', startrow=10, startcol=20)
writer.save()

book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()

@MaxU의 솔루션은 매우 잘 작동했습니다.한 가지 제안이 있습니다.

truncate_sheet=True가 지정된 경우 기존 시트에서 "startrow"를 유지하면 안 됩니다.제안합니다.

        if startrow is None and sheet_name in writer.book.sheetnames:
            if not truncate_sheet: # truncate_sheet would use startrow if provided (or zero below)
                startrow = writer.book[sheet_name].max_row

xlwings를 사용하는 것을 추천합니다. (https://docs.xlwings.org/en/stable/api.html), 이 애플리케이션에는 정말 강력합니다...사용 방법은 다음과 같이 사용합니다.

import xlwings as xw
import pandas as pd
import xlsxwriter

# function to get the active workbook
def getActiveWorkbook():
    try:
        # logic from xlwings to grab the current excel file
        activeWb = xw.books.active
    except:
        # print error message if unable to get the current workbook
        print('Unable to grab the current Workbook')
        pause()
        exitProgram()
    else:
        return activeWb

# function that returns the last row number and last cell of a sheet
def getLastRow(myBook, sheetName):
    lastRow = myBook.sheets[sheetName].range("A1").current_region.last_cell.row
    lastCol = str(xlsxwriter.utility.xl_col_to_name(myBook.sheets[sheetName].range("A1").current_region.last_cell.column))
    return str(lastRow), lastCol + str(lastRow)

activeWb = getActiveWorkbook()
df = pd.DataFrame(data=[1,2,3])

# look at worksheet = Part Number Status
sheetName = "Sheet1"
ws = activeWb.sheets[sheetName]
lastRow, lastCell = getLastRow(activeWb, sheetName)
if int(lastRow) > 1:
    ws.range("A1:" + lastCell).clear()
ws.range("A1").options(index=False, header=False).value = df.fillna('')

.xlsm 문제집은 매우 까다로울 수 있기 때문에 응용 프로그램에 매우 적합한 것 같습니다.이를 파이썬 스크립트로 실행하거나 pyinstaller를 사용하여 실행 파일로 변환한 다음 Excel 매크로를 통해 .exe를 실행할 수 있습니다.xlwings를 사용하여 Python에서 VBA 매크로를 호출할 수도 있어 매우 유용합니다.

팬더를 사용하여 데이터를 덮어쓰지 않고 기존 엑셀 파일에 쓸 수 있습니다.DataFrame.to _module 메서드 및 모드 매개 변수를 'a'(모뎀 모드)로 지정합니다.

다음은 예입니다.

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Write the DataFrame to an existing Excel file in append mode
df.to_excel('existing_file.xlsx', engine='openpyxl', mode='a', index=False, sheet_name='Sheet1')

방법:

파일이 없는 경우 파일을 만들 수 있습니다.
시트 이름에 따라 기존 엑셀에 추가

import pandas as pd
from openpyxl import load_workbook

def write_to_excel(df, file):
    try:
        book = load_workbook(file)
        writer = pd.ExcelWriter(file, engine='openpyxl') 
        writer.book = book
        writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
        df.to_excel(writer, **kwds)
        writer.save()
    except FileNotFoundError as e:
        df.to_excel(file, **kwds)

용도:

df_a = pd.DataFrame(range(10), columns=["a"])
df_b = pd.DataFrame(range(10, 20), columns=["b"])
write_to_excel(df_a, "test.xlsx", sheet_name="Sheet a", columns=['a'], index=False)
write_to_excel(df_b, "test.xlsx", sheet_name="Sheet b", columns=['b'])

언급URL : https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas

'programing' 카테고리의 다른 글

하나 이상의 개체가 이 열에 액세스하기 때문에 ALTERTABLE Drop COLUMN이 실패했습니다. (0)	2023.04.27
이클립스 코드 라인 수 (0)	2023.04.27
현재 작업 디렉토리가 아닌 파일 위치에 기반한 상대 경로 (0)	2023.04.22
Swift 옵션 값은 무엇입니까? (0)	2023.04.22
Excel 또는 스프레드시트 열 문자를 피토닉 방식으로 숫자로 변환 (0)	2023.04.22

현재글데이터를 덮어쓰지 않고 기존 Excel 파일에 쓰는 방법(팬더 사용)

각종 프로그래밍 정보를 다루는 블로그입니다.

json, Eclipse, .NET, windows, python-3.x, WPF, MongoDB, sql-server, AngularJS, spring-boot, AJAX, git, asp.net, Excel, Azure, REACTJS, ios, Bash, vb.net, Wordpress,

Today :
Yesterday :

kakaobank

데이터를 덮어쓰지 않고 기존 Excel 파일에 쓰는 방법(팬더 사용)

데이터를 덮어쓰지 않고 기존 Excel 파일에 쓰는 방법(팬더 사용)

다음 버전으로 테스트:

방법:

용도:

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

« 2026/04 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

데이터를 덮어쓰지 않고 기존 Excel 파일에 쓰는 방법(팬더 사용)

데이터를 덮어쓰지 않고 기존 Excel 파일에 쓰는 방법(팬더 사용)

다음 버전으로 테스트:

방법:

용도:

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바