airflow安装及时区问题

从源码安装

由于airflow1.9.0发行版存在时区问题,而根据stackoverflow,该问题在git的主分支上已得到修复,所以拉取airflow的git主分支并通过本地源码安装

1
2
git clone https://github.com/apache/incubator-airflow.git
cd incubator-airflow

进入到项目目录后,使用“pip install .”命令可以安装从本地源码安装,使用all选项安装所有可能用到的组件, 包括hive, mysql等

1
pip install ".[all]"

在安装过程中可能会有报错,根据错误提示和安装环境逐个解决即可

也可以仅安装自己需要的组件

1
pip install ".[mysql, hive, celery]"

时区配置

根据timezone文档 “Support for time zones is enabled by default. Airflow stores datetime information in UTC internally and in the database. It allows you to run your DAGs with time zone dependent schedules. At the moment Airflow does not convert them to the end user’s time zone in the user interface. There it will always be displayed in UTC. Also templates used in Operators are not converted. Time zone information is exposed and it is up to the writer of DAG what do with it.” Web UI 上显示的时间不会被转换为本地时区,而是只需要在写DAG时带入time zone即可

1
2
cd airflow
vim airflow.cfg

修改

1
default_timezone = Asia/Shanghai

查看配置生效:

1
2
3
4
>>>from airflow.utils import timezone

>>>a_date = timezone.datetime(2018,6,1)
>>>a_date

输出:

1
datetime.datetime(2018, 6, 1, 0, 0, tzinfo=<Timezone [Asia/Shanghai]>)

配置mysql

1
sql_alchemy_conn = mysql://airflow:airflow@localhost:3306/ariflow
1
pip install cryptography

Cenerate a Fernet key for airflow

配置celery模式

celery需要使用redis

1
pip install redis

1
vim airflow.cfg
1
2
3
4
executor = CeleryExecutor

broker_url = redis://localhost:6379/0
celery_result_backend = redis://localhost:6379/1