Python pandas.DataFrame.to_gbq函数方法的使用-CJavaPy

DataFrame.to_gbq(destination_table, project_id=None, chunksize=None, reauth=False, if_exists='fail', auth_local_webserver=False, table_schema=None, location=None, progress_bar=True, credentials=None) [source]

将DataFrame写入Google BigQuery表。

此功能需要pandas-gbq软件包。

有关身份验证的说明，请参见如何使用Google BigQuery进行身份验证指南。

参数：

destination_table ：str

要写入的表格名称，格式为dataset.tablename。

project_id ：str，可选

Google BigQuery帐户项目编号。

在环境中可用时为可选。

chunksize ：int，可选

要从dataframe插入每个块的行数。

设置为None一次加载整个dataframe。

reauth ：bool, 默认为 False

强制Google BigQuery重新验证用户身份。

如果使用多个帐户，这将很有用。

if_exists ：str，默认为’fail‘

目标表存在时的行为。值可以是以下之一：

'fail'：

如果存在表，

则引发pandas_gbq.gbq.TableCreationError。

'replace'：

如果存在表，则将其删除，重新创建并插入数据。

'append'：

如果存在表，则插入数据。如果不存在则创建。

auth_local_webserver ：bool，默认为False

获取用户凭据时，

请使用本地Web服务器流而不是控制台流。

pandas-gbq的0.2.0版本中的新功能。

table_schema ：字典类型的list,可选

与DataFrame列对应的BigQuery表字段的列表，

例如。如果未提供schema，

它将根据DataFrame列的dtypes生成。

有关字段的可用名称，请参阅BigQuery API文档。

[{'name': 'col1', 'type': 'STRING'},...]

pandas-gbq的0.3.1版本中的新功能。

location ：str,可选的

加载作业应运行的位置。

有关可用位置的列表，

请参见BigQuery位置文档。

该位置必须与目标数据集的位置匹配。

pandas-gbq的0.5.0版本中的新功能。。

progress_bar ：bool，默认为True

使用库tqdm逐块显示上传的进度条。

pandas-gbq的0.5.0版本中的新功能。

credentials ：google.auth.credentials.Credentials, 可选

用于访问Google API的凭据。使用此参数可以覆盖默认凭据，

例如直接使用Compute Engine google.auth.compute_engine.Credentials

或服务帐户google.oauth2.service_account.Credentials。

pandas-gbq的0.8.0版本中的新功能。

0.24.0版中的新功能。

例子，

from datalab.context import Context
import datalab.storage as storage
import datalab.bigquery as bq
import pandas as pd
from pandas import DataFrame
import time

# Dataframe to write
my_data = [{1,2,3}]
for i in range(0,100000):
    my_data.append({1,2,3})
not_so_simple_dataframe = pd.DataFrame(data=my_data,columns=['a','b','c'])

#Alternative 1
start = time.time()
not_so_simple_dataframe.to_gbq('TestDataSet.TestTable', 
                 Context.default().project_id,
                 chunksize=10000, 
                 if_exists='append',
                 verbose=False
                 )
end = time.time()
print("time alternative 1 " + str(end - start))

#Alternative 3
start = time.time()
sample_bucket_name = Context.default().project_id + '-datalab-example'
sample_bucket_path = 'gs://' + sample_bucket_name
sample_bucket_object = sample_bucket_path + '/Hello.txt'
bigquery_dataset_name = 'TestDataSet'
bigquery_table_name = 'TestTable'

# Define storage bucket
sample_bucket = storage.Bucket(sample_bucket_name)

# Create or overwrite the existing table if it exists
table_schema = bq.Schema.from_dataframe(not_so_simple_dataframe)

# Write the DataFrame to GCS (Google Cloud Storage)
%storage write --variable not_so_simple_dataframe --object $sample_bucket_object

# Write the DataFrame to a BigQuery table
table.insert_data(not_so_simple_dataframe)
end = time.time()
print("time alternative 3 " + str(end - start))

Python pandas.DataFrame.to_gbq函数方法的使用-CJavaPy

Python pandas.DataFrame.to_gbq函数方法的使用

推荐文档

微信小程序

抖音小程序

相关文档

大家感兴趣的内容

随机列表